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ABSTRACT 


Evolutionary programming is a relatively new problem solving approach in the 
field of computer science. It attempts to model the processes of natural selection and 
evolution to solve complex problems. This technique is very powerful because it can be 
applied to a wide range of problems, and can find solutions that other more traditional 
techniques cannot. 

This research attempts to augment the methodology of an evolutionary 
programming approach with two new features: (1) dominant and recessive traits and (2) 
intron and exon regions. These features form the basis of a specialized approach for 
evolutionary programming which might be able to be applied to new problem areas 
where evolutionary programming usually performs poorly. 

This specialized approach 1s applied to the well known problem of a series 
expansion, so that the results are easily compared to a known solution, and that the 
influence of these additional mechanisms on the population of solutions can be studied. 
Results from implementing the new mechanisms individually and together are presented, 


and compared with a baseline evolutionary programming implementation. 
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I. INTRODUCTION 


A. PURPOSE 


The idea of imbuing computers with life-like features, such as intelligence, is not 
a new idea. Early computer scientists, like Alan Turing, asked such questions as, “Can 
machines think?” He never did answer that question, but he did propose that an 
intelligent machine would have to be a learning machine and that evolution could be used 
as an intelligent learning process [Ref 1]. As a result of his work, and that of other 
pioneering computer scientists, the link between computation and “life” found. a firm 
foundation from which sprouted the fields of Artificial Intelligence (AI) and Evolutionary 
Computation (EC). 

There are many arguments for what the real definition and underlying purpose of 
“evolution” are, but one common theme among the arguments is that evolution is 
responsible for adaptations. In trying to develop models for how adaptation occurs in 
natural organisms and importing those models to computers, John Holland created 
Genetic Algorithms (GAs). Since their introduction in the 1960’s, GAs have been 
applied to solve pressing computational problems because they can be used to search for 
solutions to a specific problem among an intractable number of possibilities [Ref 2]. 

Genetic Programming (GP) is one of several techniques stemming from GAs: 
While GP uses GAs in its implementation, GP differs from a GA because it does not 
attempt to solve a problem directly. Instead of building an evolutionary program to solve 
the problem, GP uses GAs to automatically generate programs that will help the 
computer to solve a problem without specifically being programmed. 

This thesis will describe and attempt to implement a specialized GA using a GP 
approach. This specialized GA incorporates certain specific genetic characteristics and 
mechanisms that occur in complex biological organisms in an effort to create a self- 


mutating algorithm that can be applied to adaptive applications. 
B. RESEARCH QUESTIONS 


It is necessary to clearly establish which underlying biological/genetic 


mechanisms are used in this specialized GA and enumerate the benefits gained by 


choosing those mechanisms. The general characteristics of these mechanisms must be 
described unambiguously within an algorithm in order for those characteristics to map to 
a computer based analogue population. 

Assuming that a specialized GA has been developed, it is necessary to determine 
whether or not the application of this specialized GA achieves results comparable to 
traditional GAs when applied to similar problems. If the new approach works, but is 
preemptively slow to complete, than its usefulness, outside of being a novel approach, is 
limited. However, if this specialized GA can solve problems that traditional GAs cannot, 
then it could maintain legitimacy as a computationally viable problem solving method. 

Because this new approach is a GA, this thesis addresses two fundamental 
questions pertinent to all GAs: Which initial population size and probability of mutation 
should be used, and how many generations must be run before reliable results are 
produced? The GP portion of the implementation also requires that the problem to be 


solved be broken down into its appropriate functions and terminals. 
C; THESIS OUTLINE 


Upon completion of the introduction, but prior to answering the research 
questions posed, it 1s necessary to thoroughly explain some fundamental concepts in 
molecular genetics and evolutionary computation. These explanations are presented in 
Chapter II and are needed to understand the details of the computational analogues this 
thesis develops. Additionally, there has been work completed in related fields that has 
had either direct or indirect influence on the formation of this new idea. These seminal 
works are discussed and summarized in detail in Chapter III. The combination of the 
background information from Chapter II and the overview of previous works from 
Chapter III set the foundation for describing why this thesis’ research questions are 
pertinent and how they will be answered (Chapters IV and V, respectively). 

The experiment was run on a modified version of the GPSYS (pronounced 
“sipsys”) software [Ref 3]. The modifications made are covered in Chapter V. A 
statistical analysis was done on the results, the outcome of which is given in Chapter VI. 


Chapter VII summarizes the important contributions of this thesis. 


i) 


D. EXPECTED BENEFITS OF THIS THESIS 


If it 1s possible to exhibit evolutionary/adaptive traits by using this specialized 
GA, then it would seem possible to address a larger spectrum of problems than is possible 
with traditional genetic algorithms. If this specialized GA works without loss of 
optimization, then it 1s possible that techniques discovered during this research could be 
used to develop adaptive applications in the future. If a larger class of problems can be 
addressed, the genetic algorithms resulting from this work might be successfully applied 
where such were not previously possible. The benefits of such self-adapting algorithms 


to the DoD are obvious and widespread. 


II. BACKGROUND 


A. AN OVERVIEW OF MOLECULAR GENETICS 


In 1839, two German microscopists, Matthias Schleiden and Theodor Schwann 
postulated that all organisms were constructed from fundamental units called cells, and 
that all cells arise from other cells. Without having witnessed the mechanism of mitosis, 
or being able to see inside the nucleus, they accurately determined that the cell is the 
fundamental unit of all organisms [Ref 4]. 

In 1858, Charles Darwin and Alfred Wallace published their theory proposing that 
evolution in organisms occur because of natural selection. They stated that various forms 
of life are not constant but are continually giving rise to slightly different forms, some of 
which are adapted to survive and multiply more effectively. At that time, they did not 
know the origin of this continuous variation, but they realized that these new 
characteristics had to persist in progeny if such variations were to form the basis of 
evolution [Ref 5]. 

In 1865, Gregor Mendel laid the precursor to the rules of heredity when he 
postulated that various traits are controlled by pairs of factors. Three years later, Ernest 
Haeckel postulated that the nucleus is responsible for heredity. It was not until the early 
1900’s and the work of Hugo De Vries that all three biologists were proven correct when 
the pairs of chromosomes found in the cell nucleus were discovered to be the active 
factors responsible for heredity. Although Darwin and Wallace’s theory of evolution was 
to become orthodox among biologists by the late 1800’s, it was not until the acceptance 
of genetic theory that evolution theory would be embraced by the general world and other 
scientific communities [Ref 6]. 
| In 1868, a Swiss biologist named Friedrich Mliescher first identified 
deoxyribonucleic acid (DNA). It was not until 1944 with the work of Oswald Avery, 
Colin MacLeod, and Maclyn McCarty that it was believed that the DNA found within the 
genes was responsible for the physical transmission of hereditary traits from generation to 
generation. Alfred Hershey and Martha Chases proved DNA was the actual genetic 
material in 1952. Since then, subsequent advances in molecular genetics, population 
genetics, microscopy and molecular biology have allowed scientists to better hypothesize 


how organisms are able to evolve [Ref 7]. 


1. A Molecular Genetics Primer 


The following discussion of molecular genetics utilizes terminology familiar to 
anyone knowledgeable of fundamental cell biology. Appendix A provides a brief review 
of cell biology for the reader who needs to familiarize himself with these terms. 

In order to be absolutely clear about the genetic mechanisms that are being 
mapped to a computational analogue, it is necessary to ensure that there is a common, 
accepted understanding of what these mechanisms are and how they work. It is also 
necessary to introduce some genetic terminology that may not be familiar to most 
readers. Therefore, this section describes the basic structures and functions of genes. 

Every living organism is created from a complete set of instructions called a 
genome. This genome contains the master blueprint for all cellular structures and 
functions within the organism for the duration of its life. The genome is defined by the 
DNA, which when combined with protein molecules, called histones, form chromatin 
fiber. These fibers form structures known as chromosomes, which are found in the 
nucleus of all non-somatic cells. - 

Many procaryotic organisms have single-stranded DNA. An organism with a 
single DNA chain is called haploid. Many unicellar and all higher order organisms are 
eucaryotic and have double-stranded, or double helical as it is more popularly known, 
DNA. Organisms with double-stranded DNA are called diploid. 

Each DNA molecule contains many genes, which are a specific sequence of 
nucleotide bases found at a particular position, or locus, on the chromosome!. These 
genes hold all the information necessary for constructing proteins and enzymes for the 
biochemical reactions that are necessary to maintain life?. DNA, however, does not 
actually produce proteins. A single-stranded copy of the gene, called ribonucleic acid 
(RNA), is created for that purpose in a process called transcription. This process 1s 
similar to replication, except that an exact double-stranded copy of the entire DNA 


molecule is not the end product. The RNA molecule is not only single-stranded, but also 


I There are different types of DNA found in a cell, such as nuclear and mitochondrial DNA. They both are 
the templates for RNA and protein production, thus the exact differences will not be discussed. All 
references to DNA will refer to nuclear DNA, although, in general, the mechanisms apply to both types. 


2 It is more correct to say that each gene holds the relationships for a specific polypeptide. Often proteins 
are made from several polypeptide chains, each of which is the product of a separate gene. 


significantly shorter in length because it is a copy of only a portion of the DNA. This 
RNA molecule leaves the nucleus and is used to create proteins in the cell’s ribosomes?. 


The RNA is interpreted and the protein is made by a process called translation. 


DNA > RNA 27 protein 


Transcription Translation 


Figure 1. The Flow of the Genetic Code 


Not all sequences in DNA are eventually transcribed into RNA. In fact, the entire 
DNA sequence of eucaryotic organisms is a mixture of these transcribed and 
untranscribed regions. These untranslated, intervening DNA ‘sequences are called 
introns. The functional areas of DNA from which the RNA is transcribed are called 
exons. Genes whose DNA is a complex mixture of introns and exons are referred to as 
being split or discontinuous. 

Proteins, made in the translation process, are polymers of amino acids, of which 
there are twenty used in the synthesis of proteins (see Appendix B). The ribosomes are 
able to retrieve the correct amino acid because each amino acid 1s encoded by a certain 
sequence of nucleotide bases in the RNA strand, which is then placed in the protein being 
constructed. This sequence consists of three bases and it is called a codon. Each codon 
codes for a specific amino acid. Different codons that code for the same amino acid are 
called synonyms (see Figure 2). What is commonly referred to as the “genetic code” 1s 
the relation between the codons and the amino acids which they represent. This code is 
nearly, but not absolutely, universal. 

Alleles are alternative forms of a gene that can occupy a particular chromosomal 
Site. Whereas genes code a trait, such as eye color, alleles are different “settings” for the 
trait, such as blue or hazel eyes*. A gene is “expressed” when the information coded 
within it is converted into structures present and operating within the cell. Whether 


expressed or not, the particular set of genes contained in the organism’s genome 1s called 


3 There are several types of RNA. See Appendix B for clarification. 


4 More precisely, the locus refers to the place on the chromosome where an allele resides. An allele is just 
the bit of DNA at that place. A locus is a template for an allele. An allele is an instantiation of a locus. 


the genotype. The genotype gives rise to the phenotype, which is the aggregate of 


physical and mental characteristics of the organism, such as eye color, height, etc. 
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Figure 2. The Genetic Code 


Genes are arranged in a fixed, linear order. This order can change, but this is rare. 
. Movable DNA segments, called transposons, occassionally jump around chromosomes, 
thus fundamentally altering the chromosomal structure. In addition to neatly moving 
genes, transposons also scramble DNA, making deletions, inversions, and other 
rearrangements. It is becoming clear that such changes are a critical feature of 
chromosomal evolution. Because recombination and transposition generate new 
combinations of genes, they enlarge the repertoire from which natural selection chooses. 
Gene rearrangements also regulate DNA expression because gene location and 


orientation may determine whether a gene Is silent or active. 


2. Population Genetics 


Population genetics is the scientific discipline that is concerned with the genetic 
basis of evolution. It studies the frequencies and fitness of genotypes in natural 
populations. Evolution is the change in the frequencies of genotypes through time, with 
the change possibly due to differences in fitness. Changes in genotype frequencies, 
though, are not easily measured, because the time scale associated with the introduction 
of most naturally occurring genetic variants is very long and are thus impossible to 
directly observe. Mathematical models of evolution are used instead, and the behavior of 
the model is compared to that of the natural population. 

One important effect that population genetics has explained is that of dominant 
alleles within the genotype on the evolution of the organism. The type of dominance the 
allele exhibits determines how it affects the overall genotype and whether or not the allele 
will be maintained in the population. Allele dominance explains why some mutations 
can be sustained within the population, while others are not. Appendix C explains some 


of these theories in more detail. 
3. Some Biological Causes of Evolution 


The fundamental causes of life and its subsequent evolution are still a matter of 
speculation. Scientists still cannot answer the questions of why life formed or what 
caused species to differentiate. They can compare DNA or protein sequences and apply 
elements of statistical geometry to determine when the species diverged. They can offer 
Statistical arguments as to what happened to cause the changes, but the exact causes are 
sull unknown. Therefore, the following discussion is only a theory and, as with any 
theory outside of mathematics, is potentially wrong. However, it is also emphasized that 
this following theory is not trying to prove the causes of evolution in vivo, but to find 
useful evolutionary and genetic mechanisms for which computational analogues can be 
made. 

Most genes of higher eucaryotes have,many introns. Lower eucaryotes have a 
much higher proportion of continuous genes. Comparisons of the DNA sequences of 
genes encoding proteins that are highly conserved in evolution suggest that introns were 
present in ancestral genes and were lost during the evolution of organisms that have 


become optimized for very rapid growth, such as bacteria. The presence of introns has 


had an evolutionary effect, especially with regard to the development of complex 
organisms [Ref 8]. 

Exons encode discrete functional units of proteins or can encode for the whole 
protein. The former idea has been proven, which leads to the attractive hypothesis that 
new proteins arose in evolution by the rearrangement of exons [Ref 9]. Shuffling exons 
is a rapid and efficient means of generating novel genes because it preserves functional 
units while allowing them to interact in new ways. Introns are regions where DNA can 
break and recombine with no deleterious effects. Therefore, the presence of introns 
increases genetic variation by allowing more discontinuity. The greater the discontinuity 
of the DNA, the more exons are found in the DNA, and thus more possible combinations 
of exons. 

Since the developmental potential of an organism is determined by its genes, 
DNA must necessarily mutate as organisms evolve. But evolutionary changes occur only 
rarely. Since living cells require the correct functioning of thousands of proteins, each of 
which could be damaged by a mutation at many different sites in the cell’s gene, it is 
clear that DNA sequences almost always are passed on unchanged if progeny are to have 
a good chance of survival. 

Naturally occurring mutations include almost all conceivable changes in DNA 
sequences (see Appendix B). Mutations that have only a subtle effect on a gene product, 
such as temperature-sensitive mutations, are often the result of a simple switch of one 
base for another. However, there are natural mutations that.destroy the function of a gene 
completely. These more drastic changes, called null mutations, include not only base 
switches, the insertions and deletions of a base, but also extensive insertions and 
deletions and even gross rearrangements of chromosome structure: Such changes might 
be caused, for example, by the insertion of a transposon, which typically places many 
thousands of bases of foreign DNA in the coding sequence of a gene, or by an aberrant 
cellular recombination process. 

One way to reduce the chances of a harmful mutation occuring 1s to increase the 
area in which a mutation can occur that will either have no effect or at least not harm the 


organism>. Introns are usually extensive in length and account for about half of the total 


> Correction enzymes ensure that replication and recombination occur without error. Excluding radical 
modifications of DNA brought about by outside factors such as radiation, DNA has a built-in repair 
mechanisms. 
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DNA molecule. Having such a large portion of inactive regions decreases the chance of a 
harmful mutation occurring in an exon®. Thus, the introns give additional protection 
from harmful mutations. 

Recombination is not accidental, but is instead an essential cellular process 
catalyzed by enzymes which are made by the cell’s ribosomes. Besides providing genetic 
variation, recombination enzymes allow cells to retrieve sequences lost when DNA is 
damaged. By switching specific segments within chromosomes, cells put dormant genes 
into sites where they can be expressed, even creating new protein-coding regions (see 
Appendix C). 

Synonyms in the genetic code demonstrate the idea of degeneracy. If more than 
one codon did not code for the same amino acid, then only twenty codons would 
designate amino acids and the rest would be stop signals. The probability of mutating to 
a chain termination signal would then be much greater, and chain-termination mutations 
usually lead to inactive proteins. Substitutions of one amino acid for another are usually 
relatively harmless. It also allows for the DNA base composition to vary over a wide 
range without altering the amino acid sequence of the proteins encoded by the DNA. 
Therefore, degeneracy allows mutations to occur while increasing the probability that the 
mutation will not be deleterious. It also increases the variation of the genotype without 
destroying necessary functions. 

Double stranded DNA, as opposed to single stranded, allows for more variation in 
an individual’s genome, and hence the entire population. On a single stranded gene, 
there might be n different alleles at a single locus, so the individuals in that population 
can have n different genotypes resulting from differences at that locus. In a double 
Stranded gene, there are m homozygous combinations and n(n-1)/2 non-repetitive 
heterozygous combinations’. 

Organisms with double stranded DNA also follow Mendel’s law of dominance®. 
Dominant and recessive features provide an interesting dynamic within the population’s 
© This assumption is true if every nucleotide or base pair in the strand had equal probability of mutation. 


There are examples of regions where mutation is more likely to occur (see Appendix B), but in the highly 
accurate replication process, all points have an equal probability. 


7 This is not taking into account combinations that would lead to non-viable individuals and so would never 
naturally occur. 


8 This is a drastic simplification. Not all alleles are entirely dominant or entirely recessive. See Appendix 
C for further explanation. 


1] 


genotype, allowing for increased complexity. The idea of dominant and recessive traits is 
important in evolution because it often determines the viability of the individual in the 
population and the environment. Those individuals exhibiting dominant traits are usually 
said to be better adapted to the environment and thus are more likely to survive. 
However, having dominance in one area does not mean the individual is the most fit in 
the population, nor is it a guarantee that offspring will retain the trait. 

In summary, introns, degeneracy within the genetic code, double-stranded DNA, 
and dominance have all played a role in evolution. They have either acted to reduce the 
deleterious effects of mutation or have acted to increase the complexity of DNA. These 


in turn affect the possibilities of how a species can change over generations. 
4. Biological Basis of Evolutionary Computation 


Because evolution is, in effect, a method of searching among an enormous 
number of possibilities for a “solution,” it has inspired researchers trying to solve 
computational problems. In biology, the large number of possibilities is the set of 
possible genetic sequences, and the desired “solutions” are highly fit organisms. A 
highly fit organism is one capable of surviving and reproducing in its environment. For 
those computational problems that require programs to be adaptive—i.e., continuing to 
perform well in a changing environment—evolution is capable of searching a constantly 
changing set of possibilities. Furthermore, evolution is a massively parallel search 
method: rather than working on one species at a time, evolution tests and changes 
millions of species in parallel. 

EC, which will be more thoroughly discussed in the next section, is the blanket 
term used to describe the class of computer-based problem solving systems that use 
computational models of known mechanisms of natural evolution as key elements in their 
designs and implementations. All the systems simulate the evolution of individual 
structures, within a population of structures, via the processes of selection, mutation and 
recombination. Each individual in the population has some type of genetic material and 
each receives a measure of its fitness in the environment based on some type of evalution 
of that genetic material. 

From the previous section, one can see that biologists have identified many 
principles, from the genetic level and higher, which govern the evolution of living things. 


At the highest level, the theory of natural selection governs the evolutionary adaption of 


the biological world. Natural selection operates on the organism through its performance 
On one specific task—the production of offspring. In Darwin’s words, it is “survival of 
the fittest.” Those individuals that are fittest in the environment are more likely to 
survive, and thus propagate their genetic material. Individuals which reproduce are 
chosen in the selection process. In the computational world, there are several ways to 
accomplish selection, but usually the fitness 1s considered. 

During the reproduction process, assuming reproduction is not asexual, some 
genetic material is taken from one parent, and some from the other. Recombination can 
occur either during gamete formation, from which the offspring will be subsequently 
formed, or can occur directly to create the new individual. In EC, recombination is better 
known as crossover, although even in biology, these terms are synonymous (see 
Appendix B). 

Computational mutation is generally done during the crossover procedure. It can 
occur during gamete formation, the actual crossover, or both. It 1s generally a stochasitic 
process, with each locus in the genetic material having a certain probability of being 
mutated during the procedure. Mutations which occur outside the process of “being 
born” are called “cosmic ray” mutations. 

In summary, EC techniques maintain a population of structures, called 
individuals, that evolve according to the rules of selection, recombination and mutation 
specified by the user. Each individual in the population receives a measure of its fitness 
in the environment. Reproduction usually focuses on highly fit individuals, thus using 
their fitness information to keep their genetic material within the population. 
Recombination and mutation, which perturb the genetic material during reproduction in 
order to create new individuals, is a means to explore the fitness landscape and find more 


solutions. 


B. INTRODUCTION TO EVOLUTIONARY COMPUTATION 


There are several main methods in the evolutionary computation field, 
distinguished mainly by the types of structures that comprise the individuals in the 
population. These differences determine the factors by which one individual may differ 
from another, and thus the allowable genetic variation. Equally important differences 


also exist in the genetic operators used to create offspring, as well as many of the 
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selection procedures based on fitness and other parameters. The predominant styles are 
Evolution Strategies (ES), Evolutionary Programming (EP), Genetic Algorithms (GA), 
and Genetic Programming (GP). 

These different styles, while representing true differences in approach, were each 
developed by different and initially unrelated groups of people with little cross- 
fertilization of ideas in the early days of development. It is a commonly held opinion that 
even though more individuals have applied more than one of these approaches to solving 
problems, the number of people who could be considered truly interdisciplinary is small. 
The relationships and individual strengths and weaknesses of each of these styles are just 
beginning to be understood [Ref 10]. 

ESs are frequently associated with engineering optimization problems. The 
structures that undergo adaptation are typically sets of physically measurable objective 
variables that are associated with similarly measurable strategy variables in an individual. 
Fitness is determined by executing task specific routines and algorithms using objective 
variables as parameters. Strategy variables control the way in which mutation varies each 
objective variable during the production of new individuals. Recombination is usually 
applied to both objective variables and strategy variables. 

EP operates on a variety of representational structures, frequently real-valued 
objective variables or finite state machines. The objective variables are arguments to task 
specific routines and algorithms designed to solve a specific problem. Mutation is the 
only genetic operator employed, with significant strategy built into the overall algonthm 
to direct the mutation in a computationally beneficial direction. 

GAs usually operate on fixed length character strings, often binary, as the 
structure undergoing adaptation. Other representational structures are possible. Fitness 
is determined by executing task specific routines and algorithms using an interpretation 
of the character string as their parameters. Crossover is the principal genetic operator 
employed, with mutation usually included as an operator of secondary importance. 

GP is an offshoot of Genetic Algorithms in which the computer structures that 
undergo adaptation are themselves computer programs. Specialized genetic operators are 
used which generalize crossover and mutation for the computer programs undergoing 


adaptation. Generally, the programs being evolved are represented as trees. 
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C. GENETIC PROGRAMMING FUNDAMENTALS 


John Koza developed GP in 1992, although he originally applied GAs to Lisp 
expressions. Other languages have since implemented GP techniques, but they require 
that the program be represented in a hierarchical structure (1.e., tree). Regardless of the 
implementation, Koza says that there are six steps to solving a problem using GP. These 
steps are choosing (1) the terminals, (2) the functions, (3) the control parameters, (4) the 
termination criteria, (5) the fitness function, and determining (6) the program’s 
architecture. The functions are the internal nodes of the tree, while the terminals are the 
leaves. The control parameters are similar to those in GAs: the population size, the 
number of generations, and the probabilities of crossover and mutation. The termination 
criteria is usually reached when a program is found that solves the given problem, or after 
a certain number of generations. The fitness function is similar to that used in GAs. The 
last step refers to defining the number of automatically defined functions (ADFs) [Ref 
UP 

Along with these six steps, Koza also defined the necessary conditions for 
terminals and functions. These conditions must be chosen so that they are able to express 
a solution to a given problem. This property is known as sufficiency. The second 
property is that of closure, which 1s satisfied when each of the functions can accept as its 
arguments any value or data type that might be returned by any function or be taken on 
by any terminal. As long as this property is fulfilled, crossover is possible and the 
offspring tree will be syntactically correct and executable. One method of maintaining 
closure is to use a Strongly Typed Genetic Program (STGP). In a STGP, variables, 
constants, arguments, and returned values can be of any data type with the provision that 
the data type for each value is specified beforehand [Ref 12]. 

The use of ADFs in GP induce a divide-and-conquer strategy in which the 
problem to be solved is decomposed into smaller sub-problems. The subproblems are 
usually easier to solve and the results can be combined together to build the solution for 
the original problem. In the case of the subproblems, their solutions are usually 
subroutines of the whole program. In GP, these subroutines can be obtained by using 
special subtrees having a fixed number of branches. This subtree can then be 
encapsulated to act as a terminal for another program, because it returns a value that 1s 


used by that node in evaluating its function. 
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Ill. OVERVIEW OF PREVIOUS WORK 


A. THE COMPILING GENETIC PROGRAMMING SYSTEM (CGPS) 


Most GP approaches use an interpreter to execute the programs that are developed 
from the provided problem specific language. The code segments in the population are 
decoded at runtime by a virtual machine. However, interpreted code usually causes a 
large overhead and executes much slower than compiled code. In 1993, Peter Nordin 
created the Compiling Genetic Programming System (CGPS), which used the machine’s 
code as the programs in the population and eliminated the interpreter [Ref 13]. He used 
the CGPS to make a classifier function that differentiates nouns and non-nouns of 
Swedish words strictly by spelling. He compared the execution of the CGPS with that 
of a combined neural network and interpreting GP system applied to the same problem. 
The CGPS performed significantly better. The CGPS is now commercially available 
through AIM Learning Systems ™ as the Discipulus ™ and Discipulus Pro™ programs, 
which are GP/Simulated Annealing programs used to conduct computerized automatic 
learning and learning/optimization tasks at the machine code level?. 

One other significant feature of Nordin’s work is that he used the C language. The 
individuals in the population are machine code sequences resembling a standard C- 
function (see Figure 3). Valid C-functions are put together at runtime directly in memory 
by aGA. 

The header in the function call gets its arguments from the stack. The footer 
cleans up after the function is completed. The same header and footer are added at the 
beginning of the initialization of each individual in the population. The mutation and 
crossover operations are prevented from modifying these sections. 

The return instruction forces the system to leave the function and return control to 
the calling procedure. The placement of the return instruction is allowed to vary if the 
system is implemented on a Complex Instruction Set Computer (CISC), because CISC 
computers have variable length instructions. The return instruction is given a range in 
which it can be placed if run on that system. The CGPS, however, was tested on a 


limited Sun-4 instruction set with programs of fixed length. 


? The author did not accertain what the differences are between the CGPS and the Discipulus™ products. 
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Figure 3. Generic Machine Code Function Structure 
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The instruction body is the actual program that evaluates the function. Two 
classes of instructions are used. In the first class, instructions are performed between 
registers or as unary functions. Instructions in the second class require a constant 
operand to immediately follow the operator. The CPGS only uses two-register Sun-4 
instructions limited to two addressing mode types; a total of twenty-four instructions 
were used. 

The individuals in the population are randomly generated and run through a GA 
using steady-state tournament Selection and a standard uniform crossover. Mutation can 
work on the operator or, if it exists, the operand. If the operator is mutated, it can only be 
changed to a member of the set of approved instructions. No jumps, illegal instructions 
or loops are allowed. 

For his experiments, the population size was varied between 20 and 4,000 
individuals, with 40,000 individuals total being processed. The typical size of individuals 
tested was 7, 12 and 28 instructions. The population was allowed to run through 21 
training sets. He concluded that his population was able to learn the training set about as 
quickly as the neural network!9. The benefit of the CPGS in this instance is that it only 
took one minute to train as compared to the neural network’s 235 minutes. Individuals in 


the CPGS also only needed approximately 50 bytes of memory as opposed to the neural 


10 CPGS learned 86% of the words as compared to 89% for the neural network. On unknown words, 
CPGS classified 72% correctly compared to the neural network's 69%. 
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network’s 450 bytes. He also concluded that smaller populations worked better than 


larger ones. 


B. THE GENETIC EVOLUTION OF MACHINE LANGUAGE SOFTWARE 
SYSTEM (GEMS) 


In 1995, Ronald L. Crepeau developed Genetic Evolution of Machine Language 
Software (GEMS) in which he used a large set of generalized operators and instructions 
vice problem specific ones. It was thought that a large generalized set of operators and 
terminals would make the GP process inefficient. He wanted to determine if that 
assumption was true!! [Ref 14]. Unlike Nordin, though, his system did not go on to a 
commercial implementation. 

GEMS consisted of three parts: the microprocessor emulator, a pool of machine 
language (ML) programs, and the Genetic Process Controller (the GPC). Like Nordin, he 
decided to work at the machine level in order to preclude the need to compile. He wrote 
an emulator for the Z80™ architecture, which is a 16-bit microprocessor with a large 
instruction set, 64 Kilobyte addressing range, and input-output features. Included in the 
emulation were seven, 8-bit wide registers, some of which could be combined to form 16- 
bit registers, as well as Stack Pointer (SP), Program Counter (PC), and Flag (F) registers. 

The individual instructions of pool members, 1.e., the ML programs mentioned 
above, were implemented using a C structure which defined the number of bytes in the 
instruction and an array that held the actual instruction. A pool member’s set of 
instructions were stored as an array of these structures. Memory contents were stored 
within the pool member itself, with program memory and data memory segregated. 

The GPC. generated new pool members, linked them for execution within the 
emulator, evaluated the fitness of poo! members, and controlled the breeding, mutation 
and survival of pool members. Pool members were pair-wise bred and could be bred one 
or more times. A double crossover was used with preference given to the most fit of the 
parents. Only one child was produced per crossover. Each offspring was evaluated, and 
if it was found to be more fit than either of its parents, it subsequently replaced the 


weaker parent in the pool. If it was not more fit than its parents, it did not survive. 


1! He also wanted to see the effects of adding memory. 


LD 


There were two types of mutations in GEMS. The first type of mutation replaced 
a random amount of contiguous program and memory values in an offspring prior to its 
fitness evaluation. The other type randomly and completely replaced a weak pool 
member. 

GEMS was applied was to the problem of generating an ML agent that in one run 
would output the string “Hello World”. The fitness of the individual was based on 
outputting the correct string. Additional value was added if the ML agent could output 
the correct string 1n the shortest amount of instructions. 

Three benchmarks were tested for each run. Crepeau measured the number of 
generations it took until a pool member first output the correct string, how many 
generations passed until a correctly performing ML agent was stable within the pool, and 
when the shortest agent in the pool was less than 100 instructions. Runs were not 
normally terminated until all three benchmarks were reached, but there were cases when 
ML agents of less than 100 instructions were not produced. 

Pool size varied between 150 and 2000 members, each of which were run sixteen 
times. For populations of less than 500 individuals, there was a significant standard 
deviation, so for this particular problem, a larger population performed better. One 
interesting conclusion he found was that the complexity of the problem appears to 
increase linearly, vice combinatorially or exponentially when increasing the length of the 
output string. Although there are specific issues resulting from the GEMS 
implementation and the problem on which it was used, he proved that “agents of simple 
functionality can be generated with a GP process that involves a large number of ML 


operators and memory as implemented in GEMS” [Ref 15]. 
c. DIPLOIDY/DOMINANCE IN GENETIC SEARCH 


Although not the first example of utilizing diploidy and dominance in a GA 
approach, in [Ref 16], F. Green.considers the diploid chromosome as two chromosomes 
with one gene each. He used a O-1 knapsack problem in order to see how the GA 
performed on a problem with a changing global optima. He speculated that if a similar 
environment repeatedly arose, diploidy could enable the species to rapidly re-express its 
former fitness for that environment. 

Individuals of the diploid population were represented as a C++ object based on 
the Genitor GA [Ref 17]. The Genitor GA originally only worked on a haploid 


20 


chromosome but was modified so that the initial population produced would replicate the 
values from the first homologue into the second. Member functions for the class were 
added to handle diploid fitness evaluation. 

He based his dominance relationship on the enzyme production of some genes in 
biological systems. A homozygous dominant and the heterozygote both exhibit the 
dominant feature, but the homozygote actually produces more of the enzyme, 
specifically, twice as much as the heterozygote. In his organisms, allele values are the 
fitness values of the two strings on each gene in the corresponding chromosomes. These 
fitness values represent the intermediate, or the enzyme, produced. These values 
compete prior to producing the observed phenotype. Dominance is implemented by 
mapping the two intermediate, or sub-phenotype, fitness values to the scalar fitness value 
that will subsequently be used for selection. The mapping function, referred to as the 


dominance function, simply chooses the maximum of the two values. 
D. EXPLICITLY DEFINED INTRONS AND DESTRUCTIVE CROSSOVER 


In [Ref 18], Peter Nordin, Frank Francone and Wolfgang Banzhaf investigated 
introns in GP populations. Introns in GP are nothing more than evolved code fragments 
that do not effect the fitness of the individual. They divided the introns into two 
categories, Explicitly Defined Introns (EDI) and Implicit Introns (II). Ils develop through 
the process of evolution. EDIs were specifically added into the program structures, 
which were then run through the CGPS mentioned previously. An EDI is a structure that 
does not affect the fitness calculation of the individual, but does affect the probability of 
crossover between adjacent blocks of evolved code. IIs are introns that emerge from the 
code itself. 

They identified many classes of code segments that behaved as introns, but chose 
to evaluate code segments (1) that did not contribute to the fitness, and (2) had the 
property where each node could be replaced by a no operation (noop) instruction without 
affecting the output for any of the fitness cases. The number of introns in an individual is 
counted by first making the noop substitution and then counting the number of times 
there 1s no affect on the fitness. The effective length of the individual is the absolute 
length of the individual’s chromosome minus the number of introns. 

They discovered that both types of introns protect an entire individual or code 


block against the effects of destructive crossover. EDIs and IIs were also found to work 


Zz) 


together. EDIs helped to keep Is around until the perfect individual was found, at which 
time their existence dropped to nearly zero. That dropping off may be a good indication 


as to when to stop training the population. 
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IV. FORMAL DEFINITION OF THE PROBLEM 


Evolutionary biology and genetics are very complex subjects; it is necessary to 
understand each part of a highly specialized, complicated system and how the parts 
interact with each other to affect the whole. There are many questions biological/genetic 
researchers have yet to adequately answer as to why things are the way they are and how 
they work. Because this thesis is an attempt to coordinate the unrelated disciplines of 
computer science and biology/genetics, it is crucial that the genetic mechanisms have 
been either proven or have been adequately theorized before they are chosen. If the 
mechanisms chosen are not a cause ‘of evolutionary changes in organisms, then the 
approach of this thesis is faulty. If the mechanisms cannot be clearly defined, then it will 
be difficult, if not impossible, to unambiguously express their functionality as an 
algorithm that can be implemented on a computer. 

Of the three previously stated genetic mechanisms—degeneracy, intron/exon 
regions, and dominant/recessive traits—only the latter two will be implemented. 
Although these mechanisms have been implemented individually in previously discussed 
research, there has been no research uncovered that combine the two. The mechanism 
required to implement degeneracy is very complex and outside the scope of this thesis. 
To demonstrate that the specialized GA works, it will be applied to a problem to which a 
traditional genetic algorithm/program has been used successfully (1.e., to find the answer 
to a series expansion). If it can be proven that this specialized GA can evolve a working 
program, the next question to be answered is whether or not it can do better than, the 
same as, Or worse than a traditional GA/GP performance. 

As covered in Chapter I, this is a new GA approach. From Chapter II, though, 
there are many variations of GAs and other EC methods, and it is arguable that this new 
approach 1s actually a GP because of its implementation. Each of these categories uses 
different characteristics and different parameters. Experimental results will help to 
better classify as to where this new GA best fits, but even without knowing in advance 
what characteristics and parameters will be most important, there are general questions 
that need to be answered. 

As mentioned in Chapter I, the three basic questions of what initial population 


size to use, what probability of mutation to use, and how many generations must be 
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produced before reliable results are produced, must be addressed!2. Controlled runs, 
where only one of these variables is changed, are done in order to establish a working 
baseline. These variables are simply provided to the program prior to the start of the run. 
Initial population sizes of 20, 50, 100, 500, 1000 and 5000 individuals are commonly 
used throughout the literature and so are used here. The probability of mutation most 
used in the literature is 0.001. As this experiment is particularly interested in the effects 
of mutation, it is necessary to increase mutation’s occurrence. Values of 0.01, 0.1] and 1 
are used!>. With regards to the number of generations, typically between 500 and 1000 
generations has proven to be enough. While conducting the control experiments. though, 
some runs were be allowed to reach 10,000. This series should give enough data such 
that the minimum generation number required for reliable results can be determined!4. 

One important question with respect to GAs and GP is that of the representation 
of individuals. Representation in GAs is characterized by the chromosome, whereas in 
GP, representation is characterized by the functions, terminals and data on which they 
operate. In this paradigm, there are two representations, the genotype and the phenotype, 
with the genotype actually being a program. The phenotype is the result that 1s produced 
by “running” the genotype. Since the implementation is done using a GP tool, the 
genotype is represented by a parse tree holding randomly chosen values from a function 
and terminal set. The GP tool’s code has been modified to define introns (other than 
those implicitly produced) and dominant/recessive genes. Each individual’s genotype is 
randomly generated using the specialized GA so that there are individuals expressing 
recessive traits. Chapter V is a description of the system and the modifications that were 
made. 

Another important aspect of GAs and GP is that of choosing and testing the 


fitness function. Fitness functions usually define boundary conditions, but how to best 


12 Population size is intimately tied to the problem being solved. Each problem, though, has a minimum 
necessary size in order to reliably find a solution. As it is yet unclear whether or not this specialized GA 
works, population sizes that are common in the literature are used. 


13 9.001% mutation is much larger than that which occurs naturally. The goal is to demonstrate the 
function of the intron regions. As such, the percentages must be higher in order to achieve some 
observable difference within a reasonable number of generations. 


14 Allowing the population to run indefinitely will continue to result in changes. Similar numbers to more 
traditional GAs are used in order to make the results from the new GA being explored in this thesis more 
compatible with the new GA’s older counterpart. 


choose a fitness function is not the primary focus of this thesis!>. The fitness function is 
important because it determines the genetic program’s ability to evolve. The focus of the 
thesis, though, is to see the effect of incorporating dominant/recessive pairs and introns 
within the structure of an evolving program. Because this research desires to use this 
specialized GA to evolve a program, this GA must be applied to a problem for which the 
sufficiency property can be fulfilled (see Chapter H, Section C). Regardless of the 
problem being solved, one measure of fitness is whether or not an individual finds the 
correct answer. Additionally, partial credit needs to be given to individuals who were 
close to finding the answer. Extra credit should be assigned to individuals who found the 
answer most efficiently or in the fewest number of instructions executed. This implies 
that unless the value of the output is null, every individual starts with a value from which 
points would be subtracted or added depending on the fitness evaluation prior to the 
awarding of extra credit. 

This new GA is used to complete a series expansion. This problem involves a 
comparison to a known solution. The series expansion can be evaluated for a given value 
of x by expanding the series for a certain number of terms. An individual’s fitness can be 
determined by the absolute difference between the “real’’ answer (1.e., one computed on a 
calculator) and the individual’s return value. Since there is a minimum required number 
of operations to be performed in a series expansion, those individuals who can find a 
good solution in the least number of operations should receive a higher fitness than 
individuals who find the same answer in more operations. 

In order to determine whether or not the specialized GA works, it is necessary to 
be able to determine the average fitness of the population, and the fitness and structure of 
the most fit individual over the variety of control parameters (e.g., population size, 
generations and probability of mutation). In the case of the series expansion problem, the 
absolute difference between the real answer and the population/best individual answer 
should be very small if the individual’s fitness is to be considered a good one. 

Once the introns are incorporated into the program, not only is it necessary to 


track the average fitness and the fitness and structure of the best individual, but also the 


15 One of the original inspirations for this thesis was the idea of increasing the search space by allowing 
“dead” individuals. A “dead” individual is defined as one with low or no fitness value assigned to it. This 
would prevent it from being selected to procreate in a Darwinian world. The “dead” individuals would be 
allowed to remain within the population and would be subject to genetic operations. Although that idea did 
not come to fruition, it is sull intriguing. 


number of introns in the best-fit and worst-fit individuals, as well as the average number 
of introns per individual in the population. If introns are beneficial in finding better 
solutions, then the best individual should have more of them, most of the population will 
have some of them, and the worst will have few, if any. 

Dominance should be implemented separately from introns and tested before the 
two are combined within the environment. In order to determine the effect of dominance 
on the population, it is necessary to determine the number of dominant and recessive 
genes in the best-fit and worst-fit individuals, as well as the average number of dominants 
and recessives within the population. 

When the two are combined, all of the above data must be tracked. To determine 
if they work, each stage must be compared to the original test runs and compared to the 
previous test results. 

In summary, the fundamental question this thesis attempts to answer is whether or 
not this specialized GA works, meaning whether this specialized GA succeeds in creating 
a population of individuals that is capable of finding a program that, when executed, is 
the solution to the problem. Assuming this specialized GA is shown to work, the next 


step is determine under what parameters does it perform best. 
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V. DESCRIPTION OF THE SYSTEM AND METHODS USED TO DETERMINE 
EVOLUTION 


A. GPSYS-1.1 


GPSYS-1.1 was written by Adil Qureshi in 1997 at the Department of Computer 
Science, University College, London, United Kingdom [Ref 19]. The primary reason for 
choosing it over others 1s because it is written in Java, which 1s better understood by this 
author than C++ implemented tools such as GP-Quick [Ref 20], GALib [Ref 21], lilgp 
[Ref 22], Genitor [Ref 23], Avida [Ref 24] and GPCPP [Ref 25]. GPSYS 1s a Strongly 
Typed GP, with many built-in primitives, generic functions and terminals. It includes 
commented source code along with javadoc documentation for all classes. Example 
problems are provided with GPSYS which were invaluable in learning how to use ‘the 


system. 
B. STRUCTURE OF GPSYS-1.1 


GPSYS uses a tournament selection to choose the individuals who will be 
mutated, crossed over and replaced within the population. In GPSYS, a default value of 
seven individuals are choosen at random from the population to compete in the 
tournament!®. The number of individuals competing in each tournament is assigned by 
the tournamentSize variable defined in the GPParameters class. This class must be 
compiled before beginning a run in order for a new tournament size to be applied. The 
GPParameters class 1s also where the probability of mutation, number of generations to 
evolve and population size are defined. 

The Population class holds the array of Individuals for each generation that will 
undergo the operations of selection, crossover and mutation. The /ndividual class 1s the 
evolved GP program. The Population class has the evolve function as well as the 
selection methods for tournaments, mutation and crossover. When the population 
evolves, new individuals are created either by mutation or by crossover, but not both. 
The probabilities are based on a random seed provided at the start of the run. This 


random seed is defined by Java’s Random class. 


16 Selection is random and is not influenced by an individual’s fitness or any other attribute. 
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Every Individual contains a Chromosome class and 1s assigned a fitness value 
based on the Fitness class. The Chromosome class represents the tree whose nodes are 
members of the Gene class. The Chromosome class also defines the genetic operators. If 
a mutation 1s chosen as the evolving mechanism, a tournament is run from which is 
selected the most and least fit individuals. The most fit individual will be cloned and the 
clone will be mutated by selecting a branch of the tree at random and replacing the 
branch with a new branch. The new branch is generated in one of two ways. One 
method is to generate a tree of the maximum specified depth, which 1s defined in the 
GPParamters class. The second method builds a tree in which the probability of 
encountering a leaf is the same as that of an internal node. Which method is used is 
probabalistically determined and based on the random seed. This mutated clone will then 
replace the least fit individual selected by the tournament within the population. 

If crossover 1s the chosen evolutionary mechanism, then a tournament is run, from 
which three individuals are selected: the most fit (mother), the next best-fit (father), and 
the worst fit. Incestuous crossovers are permitted, although a warning is given should 
such occur!’. The mother is cloned and a branch from the clone is randomly selected for 
replacement. A branch from the father is chosen, copied, and placed into the clone, 
creating the child!®. If the child’s depth exceeds the maximum as a result of the 
crossover, it 1s returned to its original state as simply a clone of its mother. The worst-fit 
individual selected in the tournament is then replaced by the child. 

The Gene class is a node in the GP tree. Since GP trees are based on functions 
and terminals, the GeneJerminal and GeneFunction classes extend the Gene class. Both 
classes use the Primitives class, which represents a unit in the evolved program. This 
class is extended by the Terminal and Function classes, where Terminal defines standard 
Java primitives such as integer, float, and object. The Functions class can be extended in 
order to implement user-defined functions. GPSYS comes with many predefined 


functions such as arithmetic and logical operations. 


17 An incestuous crossover was never incountered during these runs, but there is nothing provided within 
the system to prevent it from happening. 


18 The branch chosen from the father must return the same type as that selected from the clone for 
replacement. 
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C. | HOW TO INCORPORATE INTRONS INTO GPSYS 


Because GPSYS-1.1 is fully object oriented, it is extensible, which is a very 
desirable feature for making changes. One very big problem with a Java implementation 
is the lack of pointers, making it difficult to assign the null value!?. The first change 
made by this research was to extend the Primitive class, which defined the fundamental 
types used within the system. The Primitive class 1s actually generic, with extensions 
defining the Terminals and Functions that can be used without the user having to define 
his own. The Function class was modified by the inclusion of a no-operation (noop) 
instruction. This instruction returns no values, takes no arguments, and does nothing. 
This function will be used at the root node of an intron. It can be found within the sub- 
tree of the intron, but the highest node connecting the intron to the parse tree must be a 
noop. Luckily, the Type class, which defined all the off-the-shelf usable types, also 
included a no argument constructor defined as “No Type.” 

The Gene class was modified with a boolean flag indicating whether or not the 
gene is an intron. All terminals will set this flag to false because terminals are the values 
used by the functions. The Gene class is generic and is extended by the GeneTerminal 
and GeneFunction classes. A no-operation class called Nop extends the Function class. 
The user can ensure introns are made by including Nop as one of the available functions 
when setting the GP parameters (GPParameter class) before compiling and running the 
population. 

The Population class had to be modified in order to collect the necessary data on 
the entire population (average number of introns per individual) as well as to retain the 


fitness, number of introns, and structure of the best and worst individuals. 
D. HOW TO INCORPORATE DOMINANT TRAITS INTO GPSYS 


Implementing dominant/recessive traits required modifying many of the same 
classes. The first changes made were to extend as many of the predefined functions as 
possible. The predefined functions are the dominants. These functions, which are all 
classes, are extended to create the recessives. If one is present, the recessive functions all 


subtract a value away from the individual’s fitness. 


19 Only a Java Object can be assigned a null value. Primitive types cannot be assigned to null, nor can 
primitives be type casted to an Object which could then subsequently be set to null. 
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The Gene class is also modified to include a boolean flag which is true if the gene 
is dominant, false otherwise2®. An additional allele flag is included that allows the gene 
to be identified as homozygous dominant, homozygous recessive or heterozygous. 
During crossover, this flag is checked in both parents to ensure the offspring produced is 
a Mendelian possibility with the probability of creation of that type equal to the 
Mendelian probability based on the law of dominance (see Appendix C). 

Just as with the introns, the Population class had to be further modified in order to 
be able to determine the average number of dominant and recessive traits in the 


population, as well as for the best-fit and worst-fit individuals. 
oF ANALYSIS STRATEGY 


To recapitulate from Chapter [IV and summarize the above discussion, the 
following tables are provided to graphically depict the experiments run and the data 
collected. On the test runs, which performed a series expansion without any modification 
to GPSYS, the following data was collected. 


Values tested were 0.0, 0.001, 0.01, 0.1, 0.5, 1.0 
Values tested were 10, 50, 100. 250, 500, 1000. 5000, 10000 
Values tested were 20, 50, 100, 500, 1000, 5000, 10000 


Average Complexit Complexity refers to the number of nodes in the evolved tree 
Best-Fit Fitness The best answer in the population 
Best-Fit Complexit For a comparison to the average complexit 


Table 1. Test Run Parameters Used and Data Produced 





20 Terminals cannot be dominant or recessive since they are acted on, and so the flag is set to false for all 
terminals. 


21 GPSYS tended to crash while running the larger populations for a longer number of generations. Data is 
continuously written to file as the run progresses, so some data is saved even in the event of failure. 
Because of the tendency to fail, tests on larger populations (10,000 or 5,000 individual population) run for 
longer generations were either not conducted, or the data used is that which was collected up the time of the 
failure. 
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The fitness values were determined by the following functions: 


Series Expansion: x"+x"'+... +1, where n is random, but 


does not exceed 10, 0.0<x<25 


Fitness Function: 





{ean above — individual’s return value! 


Table 2. Fitness Functions for Test Run 


Once the introns were added, the following data was necessary: 
Probability of Mutation Values tested were 0.0, 0.001, 0.01, 0.1, 0.5, 1.0 
Number of Generations Values tested were 1000 
Population Size Values tested were 500, 1000 
Average Fitness Evaluated by the Population class 
Average Complexity - | Complexity refers to the number of nodes in the evolved tree 
Best-Fit Fitness The best answer in the population 
Best-Fit Complexit For a comparison to the average complexit 
Worst-Fit Fitness The worst answer in the population 
Worst-Fit Complexit For a comparison to the average complexit 


Best-Fit Number of Introns 


For a comparison to the average population and worst 


Worst-Fit Number of Introns For a comparison to the average population and best 





Average Number of Introns Evaluated by the Population class 


Table 3. Parameters Used and Data Produced for Tests after Introns are Included 


Si 


The dominant/recessive genes required the following data: 


Probability of Mutation Values tested were 0.0. 0.001, 0.01, 0.1.0.5, 1.0 













Number of Generations Values tested were 1000 
Values tested were 1000 
Evaluated by the Population class 


= 






Best-Fit Complexit For a comparison to the average complexit 
Worst-Fit Fitness The worst answer in the population 


Best-Fit Fitness The best answer in the population 
Worst-Fit Complexity For a comparison to the average complexity 


Average Complexit Complexity refers to the number of nodes in the evolved tree 


Average Number of Dominant Gene’ 


Table 4. Parameters Used and Data Produced for the Evaluation of Dominant/Recessive Pairs 





The combination of the two required the following data to be collected: 


Values tested were 1000 

Values tested were 1000 
Evaluated by the Population class 

Complexity refers to the number of nodes in the evolved tree 
The best answer in the population 














Values tested were 0.0, 0:001, 0.01, 0.1, 0.5, 1.0 



















The worst answer in the population 

For a comparison to the average complexity 

For comparison to the average/worst dominants 

For comparison to the average/worst dominants 

For a comparison to the average population and worst 
For a comparison to the average population and best 
For comparison to the average/best dominants 

For a comparison to the average population and best 
Evaluated by the Population class 

Evaluated by the Population class 


Table 5. Parameters Used and Data Produced for Dominant/Recessive and Intron Runs 


Best-Fit Complexit For a comparison to the average complexit 


Evaluated by the Population class 
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VI. RESULTS 


A. GEOMETRIC SERIES EXPANSION CONTROL EXPERIMENTS 


The purpose behind conducting this first set of control experiments was to determine 
system behavior and with which population size and number of generations the system 
performed best. The judging criterion used to determine best performance is based on 
finding the fittest individual within the shortest number of generations. The results of 
these runs were subsequently used to determine the population size and number of 
generations to be used with the introns and dominant/recessive genes experiments. 

From Chapter IV, the population sizes varied between 20 and 10,000, with the 
number of generations running to 10,000. The probabilities of mutation used were 
selected from 0.001, 0.01, 0.1 and 1.0. Each run used the same random seed in order to 
determine what effect population size and probability of mutation had on the overall run. 
All tests were conducted on a Dell™ Dimension XPS D300 with a Pentium II processor 
running Microsoft Windows95"™ as the Operating system. All code was compiled with 


the Java Development Kit version 1.2. 


Population Size Best Fitness Generation Discovered 


10000 1085287 6284 
50 10000 2023.009 


| 100 | 10000 2023012 
| 250 | 10009 0051 


2. 504d 306.003 


3330 0.046 
0.047 
10000 0.033 


Table 6. Results of Series Expansion Runs with a Probability of Mutation of 0.001 





In these initial runs, smaller starting populations (20 and 50), even though they 
reached the termination criteria of 10,000 generations, never achieved as high a fitness as 
the larger populations. They continued to be tested, although in later runs they were 


eliminated. 
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20 10000 143258 


48 
| 100 | 10000 6108 
305.983 76 
| 1000 | 00 | sos | 
| 1000 | a0 | oss | 


Population Size Generations Run Best Fitness Generation Discovered 















0.034 





Table 7. Results of Series Expansion Runs with a Probability of Mutation of 0.01 


Larger population sizes (5,000 and 10,000 individuals) evolving for longer 
generations usually caused a java.io.exception that stopped the run before reaching the 
predefined termination condition of 10,000 generations. Data was continuously collected 
up to the time of the failures, which explains the differences in the number of generations 
each population size was run. Eventually, the 10,000 member population size was 
eliminated from testing because it did take significantly more time to complete, always 
crashed, and does not seem to provide significantly better results than a 5,000 member 


population. 


20 | 10000 
SS 
280 | to000 | s05.008 | 

[100 ee 










1000 1000 0.030 ie 
940 0.048 
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Table 8. Results of Series Expansion Runs with a Probability of Mutation of 0.1 
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| 1000 0.044 
| soo | tooo] oss | 
Table 9. Results of Series Expansion Runs with a Probability of Mutation of 1.0 














0.055 


From looking at the tables and comparing the best fitness values found against 
when they were discovered, populations of 1,000 and 5,000 achieved better results in 
fewer generations. With the exception of Table 7, the initial population size of 1,000 
always achieved slightly better fitness than that of 5,000 and did not crash within 1,000 
generations. Thus, the remaining runs were done using population sizes of 1,000 evolved 
for 1,000 generations. 

Two additional control runs were performed after deciding on a population and 
generation size. The first run had its probability of mutation set to zero in order to see 
just the effects of crossover. The second run was done to see the results of a 50% chance 


of a mutation. The results are presented in Table 10. 









Probability of Population Size | Generations Run Best Fitness Generation 









Mutation Discovered 


po | 900 | 100 | 305.980 
| os | to | 000 S| es |e 


2.944 
Table 10. Series Expansion Runs with Probabilities of Mutation of 0.0 and 0.5 






The first runs conducted used a 0.001 probability of mutation. Figure 4 shows 
the graph of the average population fitness. For readability, only the results to 1,000 
generations are displayed. The figure shows a lot of variance in the average population 
fitness from generation to generation*?. The number of outliers increased with increased 
population size. As a result, medians were used to measure central tendency. Figure 5 
Shows the results. Population sizes of 500 and 10,000 were removed from the graph 
because their median values (1.4 X 10’ and 7.02 X 10°) caused the other values to be 


unreadable. 


22 The outlier in Figure 4 plotted between generation 144 and 180 has a value greater than 1 X 10°°. 
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Figure 4. Average Population Fitness for One Run of a Population Size of 20 with a 
Probability of Mutation of 0.001 
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Figure 5. Average Fitness Values with a Probability of Mutation of 0.001 
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There is a minimum number of operations that must be performed in order to find 
a good solution to a series expansion, so the best-fit individual’s complexity should be 
the closest to that minimum (see Chapter V). In Figure 6, one can see that the average 
population complexity is very close to the best-fit complexity, but always less. That 
trend is consistent across all the tests regardless of population size, probability of 
mutation, or number of generations. Table 1] shows the average population complexity 
compared to the best-fit individual’s complexity. As the worst-fit individual’s data was 
not collected in the control runs, one can only assume from this trend that individuals 


with significantly lower complexity are the least fit. 
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Figure 6. Comparison of Average Complexity with the Best-Fit Individual’s Complexity 
for Population Size of 1,000 and a 0.001 Probability of Mutation 
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Probability of Mutation Population Size Average Population Best Individual’s 


Average Complexity 





os | to | 9.941 14.692 
ee 1.768 13.016 


* This is the only inconsistent point 


Table 11. Comparison of Average Population Complexities to Best-Fit Complexities 
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B. GEOMETRIC SERIES EXPANSION WITH INTRONS 


In order to answer the questions posed in Chapter IV, namely whether or not 
adding explicitly defined introns function correctly, the author needed to find examples 
of individuals in which the number of introns changed, but the fitness remained the 
same. Figure 7 is an excerpt from the Excel'™ spreadsheet into which all collected data 
was imported. It, along with other examples, show that adding explicitly defined introns 
within the GPSYS-1.1 does work. The next question to answer is what effect does it 


have when applied to the same problem as the control runs. 
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Figure 7. Excerpt from Run Output Showing Explicitly Defined Introns 





















With a population size of 1,000 and 1,000 generations evolved, adding introns 
produced the best fitness values depicted in Figure 8. Comparing these values to the best 
fitness found in Tables 6 through 10 from the control runs, the fitness values depicted in 
the figure most closely match results from control runs with a population size of 500. 
Those control run results were not very good, which is why 500 was not chosen as the 


original population size for further tests. 
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Figure 8. Best Individuals with Introns for Population Size of 1,000 


Since these results were not as good as those from the controls, it was decided to 
run tests with a population size of 500 to see if there was any improvement in the best 
fitness found. The only population to produce a good answer was the one with a 
probability of mutation of 0.5. Because of this encouraging result, two additional tests 
were run with a population sizes of 250 and 750. These tests were done in an attempt to 
zero in on the best population size to use for the experiment, since 1,000 performed 
poorly. However, they produced best fitnesses of 305.990 and 305.985 respectively (see 
Figure 9). 

With no consistent discovery of a highly fit individual as compared to the control 
runs, the obvious conclusion is that population sizes of 500 and 1,000 with explicitly 
defined introns run for 1,000 generations do not render as good a solution. The addition 
of introns appears to causes the population to converge on a local optima. As can be seen 
in Table 12, this convergence occurs very rapidly. However, the introns prevented the 
extreme variance in average population fitness similar to what was depicted in Figure 6. 


The results from one run are shown in Figure 10. 
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Figure 9. Best Individuals with Introns for Population Size of 500 
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Table 12. Generation of Discovery of Best Fitness 
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Figure 10. Single Run Average Population Fitness for Population Size 1,000 with a 
0.001 Probability of Mutation 


Table 11 compares average population complexity and the average best-fit 
individuals’ complexity. The table shows that the best-fit individuals are almost without 
exception more complex than the population. Figure 11 is a comparison of the average 
population complexity with that of the best- and worst-fit individuals. One can see that in 
several bars the average population complexity remains smaller than the best-fit average 
complexity, but this not consistently the case. An important point though is that the best- 
' fit complexity over all runs remains relatively the same. This similarity is probably due 
to the quick convergence to local optima in all the tests. Figure 8 shows all the local 
optima were very close in fitness value. 

Since the fitness values of the best individuals converged so quickly to a local 
optima, were introns responsible? Figure 12 compares the average number of introns for 
a member of the population to that of both the average best and worst-fit individuals. In 
the runs with a probability of mutation of 0.001, 0.01, and 0.1, the best-fit individuals 


never had an intron. Additionally, very few introns were found within either the entire 


population or the worst individuals during those runs. 
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Figure 11. Average Complexities of the Population, Best-Fit, and Worst-Fit for 
Population Size of 1,000 


Table 13 1s a comparison of whether introns existed in the population, best 
individual, and worst individuals, and whether or not they were ultimately excised from 
the tree. Introns consistently resided within the worst-fit individual longer than the best- 
fit individual, but the worst-fit individual did not always have an intron. In most cases, 
the worst individual had no introns. In Figure 12, the worst-fit individual has fewer 


introns than the average population. 
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Figure 12. Average Number of Introns in the Population, Best Individual and Worst 
Individual 

















Probability of Mutation | Generation in which | Generation in which | Generation in which 





introns no longer found | introns no longer found | introns no longer found 
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“Introns were found sporadically throughout the rest of the generations, but were primarily 0 


Table 13. Comparison of Existence of Introns 
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From the information provided in Chapter II, the chromosomes, or gene trees in 
this instance, are acting like the chromosomes of the lower eucaryotes which have lost 
most of their introns in favor of more continuous genes. The consistently low occurrence 
of introns throughout the population indicate that the genetic mechanisms, without 
specifically being directed to do so, replace the intron subtree. This function is consistent 
with the “splice point” role of introns in DNA recombination. 

In the tests with high probabilities of mutation, the average population fitness 
show little variation as compared to runs with a low probability of mutation because the 
intron region is an area where these mutations occur without affecting the overall fitness 
of the individual??. Although there was one run that found a good solution, the general 
effect of adding introns 1s that of preventing the drastic changes resulting from crossover 
and mutation. With dramatic changes less likely, the population becomes more 


homogenous, which explains the quick and permanent convergence to local optima. 


C. GEOMETRIC SERIES EXPANSION WITH DOMINANT/RECESSIVE 
GENES 


In order to answer the questions posed in Chapter IV relating to whether the addition 
of dominant/recessive genes works, the author needed to find examples of individuals in 


which the number of recessive genes changed and the fitness increased. Figure 13 is an 


1 


excerpt from the Excel spreadsheet into which all run data was imported. It, along 


with other examples, show that adding dominant/recessive genes within GPS YS-1.1 does 
work. Next, what effect do dominant/recessives have when applied to the same problem 


as were the control runs? 
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rseoral ta CtSSC~“‘tiYSC‘CSS(C‘*SS 
rseoral ta] SCiSCC‘iYSC‘“‘SC*S 
resort CitSSCSC‘SC‘“‘CSC# 
A 7 
pssoraal i] SCC‘ A 


Figure 13. Excerpt from Run Output Showing Dominant and Recessive Genes 








23 Introns will affect fitness if complexity is considered during evaluation. If two individuals evolve to the 
same answer, the less complex individual will be given the higher fitness. 
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With a population size of 1,000 run for 200 generations, adding dominant and 
recessive genes produced the best fitness values as depicted in Figure 14. Only durations 
of 200 generations were run because early analysis of the output data on runs of 1,000 
generations showed convergence on the same value at approximately 20 generations. 

As compared to the runs with just the introns enabled (see Figure 8), 
dominant/recessive genes either had similar or better results (55.011 an 1.044). Similar 
tests were run for a population size of 500 with similar results (see Figure 15). 

Table 12 notes that the addition of introns caused a quick convergence upon a 
local optima. Table 14 is a similar comparison but with a population including dominant 
and recessive genes. Convergence was not as quick as with the introns cases. Although 
more aligned with the fitness values from the control runs for population size of 1,000 
(see Tables 6-10), the time for convergence for a population size of 500 is much faster 


than those reported from the control runs. 
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Figure 14. Best Individuals with Dominant/Recessive Genes in Population Size of 1,000 
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Table 14. Generation of Discovery of Best Fitness 
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Figure 15. Best Individuals withDominant/Recessive Genes for Population Size of 500 
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Figures 16 and 17 compare the complexities of the average population, best-fit, 


and worst-fit individuals for population size of 1,000 and 500, respectively. 
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Figure 16. Average Complexities of the Population, Best-Fit, and Worst-Fit for 
Population Size of 1,000 


The results in Figure 16 are consistent with the control runs in that the average 
population complexity is slightly less than the best fitness. Figure 17 shows the same 
trend, except for the run with a probability of mutation of 1. In Table 11, which listed the 
complexities of the best-fit individuals from the control runs, there was one run at that 
probability of mutation that also was not consistent. The difference in the values is very 


small, as was the case in the control runs. 


SO 

















0 = a il 
Pmut = 0.0 Pmut = 0.001 Pmut = 0.01 Pmut = 0.1 Pmut = 0:5 Pmut = 1.0 


White-Population Grey-Best Individual Black-Worst Individual 


Figure 17. Average Complexities of the Population, Best-Fit, and Worst-Fit for 
Population Size of 500 


Figure 18 is a comparison of the average number of dominant and recessive genes 
in the population, best-fit, and worst-fit individuals. The results are what one would 
expect; there are generally more recessive genes in the worst-fit individual, represented 
by black bars in the figure, than there are in the best and the average population. The 
very low recessive gene occurrence in the higher probabilities of mutation is puzzling. 
Both the crossover and mutation mechanisms have equal opportunities for exchanging a 
recessive gene for another recessive gene. So why is the occurrence so infrequent? 

The dominant and recessive genes are modeled after Mendel’s law of dominance 
(see Appendix C). As a result, crossover is similar to meoisis and sexual recombination 
and follows Mendel’s law of segregation. In finite populations where Mendel’s law of 
segregation holds, the population is subject to genetic drift, a dispersive evolutionary 
force that removes genetic variation. Mutation usually counters its effects, but in this 
problem many of the mutations are neutral because the end products are functionally the 


same because a recessive gene can replace a dominant one and vice versa. Thus, this 


Sl 


particular implementation follows the neutral theory and is an example of Non-Darwinian 
evolution. Although the crossover probabilities were modeled after Mendel’s law, the 
type of dominance being exhibited 1s that of underdominance, which 1s influenced by a 


genetic drift. 
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Figure 18. Average Number of Dominant and Recessive Genes in the Population, Best 
Individual and Worst Individual for Population Size of 1,000 


One feature that populations which include dominant and recessive genes share 
with those which include introns is that they prevent the large variance in average 
population fitness seen in the control runs. Figure 19 is a typical example showing that 


the average population fitness falls within a smaller range. 
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Figure 19. Single Run Average Population Fitness over 200 Generations for a Population 
Size of 1,000 with Probability of Mutation of 0.1 


D. GEOMETRIC SERIES EXPANSION WITH BOTH INTRONS AND 
DOMINANT/RECESSIVE GENES 


Although only based on runs using the same random seed, the results of the 
previous sections provide enough information about the effect of adding introns or 
dominant/recessive genes such that test results from the combination of the two can be 
better explained. This thesis is most interested in determining what happens when both 
mechanisms are used in the same population. As such, twenty-one random seeds were 
used to run the tests so that a statistically significant result could be found. 

One overall finding is that the average population fitness did not exhibit the 
disparity manifest in the control runs. Medians continued to be used to measure central 
tendency for consistency with earlier experiments. 

The following tables depict the best-fit individual found for population sizes of 
500 and 1000 evolved for 200 generations using the six previously mentioned 
probabilities of mutation. Following these are the results for the entire population in 
which that best-fit individual was found. 
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Table 15. Best-Fit Individuals for Population Size of 500 with Probability of Mutation of 0.0 


The distinct absence of introns in all the best-fit individual was at first a cause for 
possible concern because the tests run with only introns had at least one intron in the best 
individual (see Table 13). Out of these mins, only 3 ever had a best-fit individual with an 
intron at any time and that intron was removed early in the run. More noteworthy 1s that 
introns were completely removed from the entire population for every random seed tested 
(see Tables 17 and 18). 

The excision of the some introns is consistent with the findings reported in a 
section 2, but the complete removal of introns in populations with a low probability of 
mutation shows the effect of genetic drift caused by the dominant and recessive genes. 
Genetic drift is also responsible for the removal of all recessive genes as well. 
Additionally, the tournament selection process would also rid the population of a portion 
of individuals with introns and recessives because the least fit individual selected in any 


tournament is the one that will be replaced as a result of crossover. 
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Tables 19-22, which show the results from using a 0.001 probability of mutation. 
Tables 23-26, which show the results from using a 0.01 probability of mutation, show 
nearly identical results. The small probability of mutation is not strong enough to counter 


the effects of the drift. The tables are provided for completeness. 
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Table 16. Best-Fit Individual for Population Size of 1,000 with Probability of Mutation of 0.0 
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Table 17. Population Statistics for Population Size of 500 with Probability of Mutation of 0.0 
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Table 18. Population Statistics for Population Size of 1,000 with Probability of Mutation of 0.0 
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Table 19. Best-Fit Individual for Population of 500 with Probability of Mutation of 0.001 
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Table 20. Best-Fit Individual for Population Size of 1,000 with Probability of Mutation of 0.001 
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Median 
Population 
Dominants 


Median 
Population 
Recessives 


Median 
Population 
Complexity 


Median 
Population 
Fitness 


Generation in Median Generation in 
which Population | which Introns 
Recessives Introns Disappeared 
Disappeared from 
from Population Population 


rr3oamo26 [11736 | 3588 | 1786+ Never | OOP 
[14498739 | 14.781 2.698 | 2.692 | Never [0 | 
pisssarrs | “vires | sr [0 ar 
r 4560987 | 11780 | 3380 | 0 —~*t 

Psiosoer|—inse[~s40n [8 fo 
1aa8a776[ 13.580_| 5408 | 0.888 [Never [0 +4) SO 
[14276082 | 11.780 | 4474 | 0908 [Never [0 | 2 
Pseoior [irae [4a80—[ 0910 [Never [0 
14321856 [ 11.760 | 4202 | 1.156] Never | 0 S*SO 
rings | ui.712| 3.582 __| 1.766 Never [| 0) 
715568897 | 15.160 [5.288 | 1.766 | Never [| 0+) 
714365300 [11.760 | 3590 | 1.792 | Never [0 
raai05%8 | 13.479) 44a [i498 —~+|~CNever =~ 
713988350 | _11.700_| 3584] 1.776 | Never | 0) 2 
2004814 [13503 | 5364 | 0.892 | Never | 0 *4Y 2 
Pareeniy [ine[—sa10 [0 Jt 9 
[6691426 | 11.724 | 4.460 | 0.880 | Never | 0+) 
354194 | 13.624 [5.376 | 0986 | Newer [0 * Oo 
ri2942572[_11784_| 3582 —~+21808 =| Never ~~ dS 
152967 [20.688 | 7.152 | 2.696 [Newer | 0 +| 
a a lS 


—— 
12942572 11.784 4.480 1.156 2 


“Introns or recessive genes were found sporadically throughout the rest of the generations, but were 
primarily absent. 
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Table 21. Population Statistics for Population Size of 500 with Probability of Mutation of 0.001 
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Median Median Median Median Generation in Median Generation in 









Population | Population | Population Population which Population | which Introns 

Fitness | Complexity | Dominants | Recessives Recessives Introns Disappeared 
Disappeared from 
from Population Population 


Peereeia[ nas] 4a7O | 0902 | Never =| OPS 
[9760143| 11.794 [5394 | 0 ~+| 6 SiC 
[16008376 15.096 | 3303] 1742~| Neer =f OS 
9809607 11.776 | 3.603 | 1.783 | Never | 0) 12 
[15432350] _11.730_| 3587 | 1776 | Never | 0 | 13 
[25076858] 11.806 3577 | 1.828 | Never [0 +) is 
7 8656193] _13540_| 6267 | 0 | 32,~S~«dS~SCi 
~s865615| 13544 | 5300 | 0883 | Newer | 0+) > 
[sa11607|_13.552 | 3377 | 0897 | Never | 0+ 
33651367, 11.783 | 4.498 | 0.803 | Never | 0 -| 
[73936360 11.854 | 4483 | 0987 [Never | 0 | 3 
[a7336t4] 13526 | 3566 | 2.684 | Never | 0+ 
3223203 13.482_| _3565__| 2.685 | Never | 0 | 3) 
10384052, _1174_| 3373 «| 0 SC«dC tC 

3007887] 11.818 5395] 0 | 32 —SidsSCiStidC 
3804003, 13.568 | _3584 | 2.690 | Newer | 0 ~+| 

9818730) 11.750_| 4450 | 0910 | Never [0 | 6 
Toissi69| 11.778 | 4as1_| 0911 | Never | 0 | 6 
16454039 16896 | 6180 1.773 | Never | 0 | 7 
Trosarsso i748 4aas_[ 0929 | Never | 0 | 
[342313] 13464 6231 [0 ~~) 3s Sid Cd 


Median: 11.818 4.483 

9809607 
"Introns or recessive genes were found sporadically throughout the rest of the generations, but were 
primarily absent. 
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Table 22. Population Statistics for Population Size of 1,000 with Probability of Mutation of 


0.001 
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Table 23. Best-Fit Individual for Population of 500 with Probability of Mutation of 0.01 
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Table 24. Best-Fit Individual for Population Size of 1,000 with Probability of Mutation of 0.01 
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Median Median Median Median Generation in Median Generation in 







Population | Population | Population Population which Population | which Introns 

Fitness | Complexity | Dominants | Recessives Recessives Introns Disappeared 
Disappeared from 
from Population Population 


7 aas9365| Tas S3eH POF Never | 0 *dYCNever’— 
3036917] 11.780 | 5.368 | 0.008 Never. | 0 | _Never 
[13270204] 11.800 5.384 | 0.008 Never. | 0+ Never 
14458805] 13.616 | 6296 | 0.004 | Never [0 _[| Never" 
13008576, 11.784 | 5360 | 0.006 | Never. | __0 | Never 
72921386 13.496 | 6236 | 0.006 | Never" | 0+ Never 
y4222781] 13.540 [3368 | 0892 Never | 0 | Never 
315493] 11.840 | 4518 | 0894 | Never [0 | _Never 
~3929053[ 13592 | 584 | 0.894 | Never [0 | Never 
Traro7i3s[_1.776 | _4482__| 0902 [Never [0 | Never" 
Tieaio70[ 13576 | 5372 | 0910 | Never [| 0 | _Never 
7312462] 11.752| 4450 0914 | _Never | 0 [| Never 
-“Soasa7ol 1.812 | 3626 [1.780 | Never | 0 | Never 
[16886785] 13.460 | 4438 | 1.790 Never | 0 | Never _ 
2095219[ 13.588 | 4a92__[ 1.796 [Never [0 | _Never 
[2267586] 13.568 | 460 | 1.81_| Never [0 | _Never_ 
[15452469] 11.820 _3574_| 1.832 [| Never [0 | Never 
 6636943[ 11.832 2.648 | 2.864 [Never [0+ Never 
T¥7s6s211[ 13.628 | 2.756 | 3560 [Never | 0 | Never _ 


Median: 
6636942 11.840 4.518 0.902 


"Introns or recessive genes were found sporadically throughout the rest of the generations, but were 
primarily absent. 


















Table 25. Population Statistics for Population Size of 500 with Probability of Mutation of 0.01 
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Median Median Median Median Generation in Median Generation in 



























Population | Population | Population | Population which Population | which Introns 

Fitness | Complexity | Dominants | Recessives Recessives Introns Disappeared 
Disappeared from 
from Population Population 


1525400711786 S373| 0004 | Never 0.001| Never 
T9278676[ 11.778 [3377 _| 0004 | Never | 0.001_| Never 
9092490, 11.792 3.381 | 0.004 | Never | 0.001_| Never 
[10378340] 11.800 | 3.384 | 0004 [Never | 0.001_| Never 
7 9932836 11.738 | 5.350 | 0.005 | Never] 0.001__| Never 
11085528, 11.736 | 3363 | 0008] Never | 0.001 | Never 
9816659, 11.766 | 5.365 | 0.005 | Never | 0.001_| Never 
rosisa6s| 11774 | 5372 | 0005 | Never | 0.001__[ Never 
[ 8ag1694] 13.588 | 6.262 | 0.005 [Never | 0.001 [Never 
[3133824] 13.490 | 6214 | 0.007 | Never | 0.001__[ Never 
[s7aro1a| 11786 | 4a95__| 0889 | Never | 0.001| Never 
[3513889] 13.580 | 3401 | 0.891 | Never | 0.001_| Never 
Troxa2374] 11.736 | 4469 | 0.893__| Never [| 0.001 Never 
[9542953| 1-782 _[ 4486 | 0.893 | Never | 0.001 | Never 
[10376794[ 11.790 | 4490 | 0893 | Never [0.001 | Never 
79115347|_13.600 | 5400 | 0900__| _Never_| 0.001_| Never 
[4446680[_ 11.820 | 4483 | 0916 | Never | 0001 | Never 
[356949 20.480 | 8803 | 0.919 | Never | 0.001 | Never 

o7asz28[_ 11.804 [3566 | 1.829 | Never | 0.001 | Never 


Median: 
9278676 11.736 5.365 0.889 


Table 26. Population Statistics for Population Size of 1,000 with Probability of Mutation of 0.01 
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Tables 27 and 28 show the best individual found in tests with population sizes of 
500 and 1,000, respectively. Tables 29 and 30 show the population statistics. Although 
Table 28 has one excellent solution, and several very good ones, the overall performance 
is poor compared to the results of the control runs. In fact, the values achieved are no 
better than the ones in previous tables. The reason for the disappointing results is the 
quick convergence caused by the introns. Although there are examples of the best-fit 
individual being found after the first 100 generations, the median generation of 
convergence is 10 to 16, respectively. The introns are prevalent enough to deter the 


beneficial effects of crossover that push the population towards better solutions. 
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Table 27. Best-Fit Individual for Population Size of 500 with Probability of Mutation of 0.1 
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Table 28. Best-Fit Individual for Population Size of 1,000 with Probability of Mutation of 0.1 
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Median 
Population 
Complexity 


Median 
Population 
Introns 


Generation in 
which Introns 
Disappeared 


Median Median Generation in 
Population | Population which 
Dominants Recessives Recessives 
Disappeared from 
from Population Population 


T7sa3@3|_1s6ea [Saad [OOF | Never | 0010+ —Never_— 
[14292803] 11.860 | 4486 | 0.944 | Never | 0.010 | Never 
[16806734] 13.680 [4508] 1.822_| Never | 0.010 | Never 
13945383) 13.688 3616 | 2.728 | Never | 0.010] Never 

-4970352| 11.816 | 5346 | 0082 | Never | 0012 | Never 
[2098868] 13.688 | 4.456_| 1.852 | Never | 0016 | Never 


Median: 
7816937 11.888 4.504 0.944 0.012 


Table 29. Population Statistics for Population Size of 500 with Probability of Mutation of 0.1 
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Median Median Median Median Generation in Median Generation in 






Population | Population | Population Population which Population | which Introns 

Fitness | Complexity | Dominants | Recessives Recessives Introns Disappeared 
Disappeared from 
from Population Population 


[1387192411842 3367 | 00 [Never | 0011 | Never — 
[901914] 20.466 | 9.635 | 0.060 | Never | 0.016 | Never 


Median: 
8528211 13.598 5.366 0.053 0.012 


Table 30. Population Statistics for Population Size of 1,000 with Probability of Mutation of 0.1 
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The results in the remaining tables support the premise presented in Chapter II 
that introns counteract some of the effects of potentially harmful mutations. These runs 
had the highest probabilities of mutation and showed the best results. 

Tables 31 and 35 show the best-fit individuals for a population size of 500 with 
probability of mutation of 0.5 and 1.0, respectively. While the median best-fit is similar 
to the results from previous runs, 12 out of the 42 runs found a good solution. A 
population size of 1,000, shown in Tables 32 and 36, achieves a far better median 


solution. 
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Table 31. Best-Fit Individual for Population Size of 500 with Probability of Mutation 0.5 


Keeping track of the median complexities throughout all of the previous tables, 
the average best-fit complexity is 15 with a average population complexity of 11.8 and 
with little variance. It is possible to achieve a good result with a complexity of 15, but 


the examples of such are few. Looking back through the tables, one would see that the 
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sporadic good results are often accompanied by a higher complexity. In Tables 31, 32, 


35 and 36, the runs that achieved a good result have higher complexities. 
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Table 32. Best-Fit Individual for Population Size of 1,000 with Probability of Mutation of 0.5 


The increase in complexity is caused by the mutation operation. There is a 50% 
chance of replacing a mutated branch with a new full branch that can extend to a depth of 
between one and three. The remaining 50% extend to a depth of one to three, but cannot 
be complete subtrees. Which occurs depends on the random seed entered at the 
beginning of the run (see Appendix D). In the previous runs with the lower probabilities 
of mutation, only replacement by a more complex subtree during crossover could 
increase complexity. With the introns causing the average population fitness to fall 
within a smaller interval, the range of complexities also narrows. If the range of 
complexities is smaller, the population homogenizes because the possible combinations 


have been reduced. If the population is homogenous, crossover will simply exchange 


7). 


similar subtrees (Similar subtrees are “similar” in composition and complexity. 


Exchanging similar subtrees will not drive the population to a better solution). 
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Population} Population | Population | Population which Population | which Introns 
Fitness | Complexity | Dominants | Recessives Recessives Introns Disappeared 
Disappeared from 






from Population Population 


71899637] 12.268 | 4516 _| 1.108 | Never] 0.070 | Never 
- 710391 14.164 | 5.310] 1.198 | Never | 0.086] Never 


Median: 
1750736 12.328 A 0.072 


Table 33. Population Statistics for Population Size of 500 with Probability of Mutation of 0.5 
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Median Median Median Median Generation in Median Generation in 










Population | Population {| Population | Population which Population | which Introns 

Fitness | Complexity | Dominants | Recessives Recessives Introns Disappeared 
Disappeared from 
from Population Population 













7a16163, 12.164 Never 
Never 
Never 
Never 
Never 
Never 
Never 
[345916 21.036 | 7.024 [2.921 | Never | 0.091 [Never 


Median: 
168493] 14.102 5.368 0.307 0.084 


Table 34. Population Statistics for Population Size of 1,000 with Probability of Mutation of 0.5 
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Table 35. Best-Fit Individual for Population Size of 500 with Probability of Mutation of 1.0 
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Table 36. Best-Fit Individual for Population Size of 1,000 with Probability of Mutation of 1.0 











— 






— 






— 






i 






—— 







—! 






to 






bo 






bo 






bo 







i 






is 




























Median Median Median Median Generation In Median Generation in 





Population; Population | Population Population which Population | which Introns 
Fitness | Complexity | Dominants | Recessives Recessives Introns Disappeared 
Disappeared from 




















from Population Population 


-443893[ 16.620 | 6236 | 1.618 | Never | _0.310_| Never 


Median: 
333436 [qeA 56 6.032 2270 O34 


Table 37. Population Statistics for Population Size of 500 with Probability of Mutation of 1.0 
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Median Median Median Median Generation in Median Generation in 
Population | Population | Population Population which Population | which Introns 
Fitness {| Complexity | Dominants | Recessives Recessives Introns Disappeared 


Disappeared from 
from Population Pop ulation 


Se 


Median: 
376612 18.164 oe | 0.314 


Table 38. Population Statistics for Population Size of 1,000 with Probability of Mutation of 1.0 
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VII. CONCLUSIONS AND RECOMMENDATIONS 


A. CONCLUSIONS 


While an interesting study, incorporating introns and dominant/recessive genes 
into GPSYS-1.1, as they were implemented in this thesis, showed no improved 
performance except in environments featuring a high probability of mutation. Under a 
low probability of mutation, the population is mostly affected by the genetic drift, 
introduced by adding dominant and recessive genes. Introns cause the fitness to converge 
to a local optimum and genetic drift eventually removes the introns from the population 
entirely. Recessive genes are the next to succumb to the effects of genetic drift. 
Although dominant and recessive genes are implemented to follow Mendel’s law of 
dominance, the population behaves as if underdominant (see Appendix C). Populations 
exhibiting underdominance are subject to the effects of genetic drift because they exhibit 
disruptive selection. In disruptive selection, the frequency of occurrence of an allele, or a 
function within the gene tree, is based on that allele’s initial frequency. If the initial 
value is below a certain threshold, the allele frequency will fall to zero due to genetic 
drift. 

For those populations that do have a lower probability of mutation, but are not 
completely affected by genetic drift, introns act to prevent destructive crossovers. 
However, because introns cause convergence, they cause the population fitness range to 
also fall within a smaller interval. The decreased number of different types of individuals 
is accompanied by a decreased chance of conducting a crossover that will produce a 
much higher fit offspring. 

Under a high probability of mutation, the effects of genetic drift are counteracted 
‘and the population is able to achieve reasonably good results, but never as good as the 
unmodified version of GPSYS. The reason for the improved performance in a high 
mutation environment is that introns provide a subtree that can be mutated without 
affecting the overall fitness of the individual. Additionally, mutations that do not occur 
within an intron subtree are allowed to increase the complexity of the gene tree. This 
increase allows the gene tree to approach the minimum complexity necessary to find a 


good solution. 
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B. RECOMMENDATIONS FOR DOD 


Although the results achieved in this thesis were not stellar, it does not mean that 
the biological mechanisms could not be implemented elsewhere or implemented 
differently and found successful. Chapter IIIf provided several examples of work in 


which similar ideas were presented, tested, and in some cases, found successful. 
(. SUGGESTED FURTHER STUDIES 


If one 1s interested in continuing this work, the first recommendation is to reform 
the Chromosome class defined in GPSYS. The chromosome is haploid, but was 
modified in this thesis to make it act as if it were diploid. A different approach for 
modifying the chromosome would be to link genes together to form true homozygous and 
heterozygous gene pairs. These gene pairs can then be connected to form the 
chromosome. The difficulty of this approach is that one loses the convenient properties 
of a binary tree which are fully exploited in GPSYS. However, the increased complexity 
and the actually diploid chromosome have the potential to produce more “life-like,” if not 
better, results. | 

If one is interested in applying the biological mechanisms to a different problem, 
we Suggest that another tool, such as GALib or Avida mentioned in Chapter V, be used. 
These tools provide increased functionality and more choices for types of selection and 
types of crossover. GALib actually comes configured to make diploid chromosomes. 
Additionally, since they are implemented in C++, the programmer has access to pointers 
allowing tree manipulations to become more complex than those in GPSYS. 

Something that would have helped tremendously in this research would be a 
compilation of the plethora of GA and GP tools available. One of the most difficult tasks 
of this thesis was to determine which tool] to use. A worthwhile thesis would be to 
acquire and test the available tools, providing an overview of each tool’s strengths and 


weaknesses. 
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APPENDIX A. A CELL BIOLOGY PRIMER 


1. Cell Structure 


It is necessary to understand the basic function of each part in the cell, 
collectively called organelles, in order to understand the genetic mechanisms described in 


this thesis. 


A. The Cell Membrane 
All cells have this lipid bilayer which protects the inside of the cell from the 
outside environment. Although the membrane is fluid and semi-permeable, it provides 


the outward structure of the cell. 


B. The Nucleus and the Nucleolus 

In cells that have a nucleus, this is where the genetic material is found. 
Proteins needed inside the nucleus and genetic material used outside the nucleus are 
transported via pores through the nuclear membrane that surrounds the nucleus. The 
nucleolus, which is usually visible as a dark spot in the nucleus, is the site of ribosome 
formation. Those organisms lacking cell nuclei are called procaryotes, whereas those that 


do have cell nuclei are called eucaryotes. 


C. The Endoplasmic Reticulum (ER) 

This organelle is the transport network for molecules targeted for certain 
modifications and specific final destinations, as opposed to molecules that float freely in 
the cytoplasm. There are two types: rough and smooth. Rough ER has ribosomes 
attached to it; smooth does not. The smooth portion is responsible for production of 


lipids, such as those used to maintain the cell membrane. 


D. The Ribosomes 
The organelles are responsible for protein synthesis for the cell. They can be 


attached to the ER, or float freely in the cytoplasm. 
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E. The Golgi Apparatus or Golgi Complex 

This organelle sorts newly made proteins, packages them into small 
membrane-bound sacs called vesicles, and sends them to their proper places, both within 
and external from the cell. Proteins that are secreted often act as messengers to the 


membranes of other cells. 


F. The Mitochondria 
These organelles are the sites of aerobic respiration and are responsible for 


making adenosine triphosphate (ATP), which is the cellular energy source. 


G. The Lysosomes 

These organelles are small sacks which contain many enzymes. They degrade 
waste materials and food within the cell by breaking down molecules of DNA, protein 
and lipids into their base components. These base components can then be reused by the 


cell. 


H. The Peroxisomes 
The organelles that are the subcellular location of important metabolic 
reactions. They rid the cell of toxic substances. Relative to other organelles, little 1s 


known about how cells maintain and propagate peroxisomes. 


I. The Centrioles and Vacuoles 


These organelles are only found in plant cells. 


J. The Cytoplasm 

The cytoplasm includes everything inside the cell membrane, except the 
nucleus, and a fluid called the cytosol. Proteins travel around the cell through the cytosol 
-guided by a network of fibers called the cytoskeleton. The cytoskeleton determines the 


shape of the cell and helps it to move. 
2. Cell Division 


Cellular division occurs by a process called mitosis (see Figure 20). The result of 
mitosis is the production of at least two daughter cells, each of which contains the 


complete copy of the chromosomes within its nucleus. 


Sexually reproducing organisms also perform a special type of cell division, called 
meiosis, to create gametes (see Appendix B, Figure 22). Sexually reproducing organisms 
are diploid, whereas their gametes are haploid. Meiosis reduces the chromosome number 
in the gamete such that when the gametes from each parent fuse together, a diploid 


zygote is formed. 


Cell Nucleus with 
Chromosomes 


DNA 
Replicates 


2 Daughter 
Cells 


Figure 20. Mitosis 
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APPENDIX B. AN ADDITIONAL MOLECULAR GENETICS PRIMER 


The molecular genetics primer provided in Chapter 2 was only an overview of 
necessary information. This appendix is provided for the reader who would like to know, 


or perhaps needs, additional information. 
1. DNA Structure 


DNA is a very long, threadlike macromolecule made up of a large number of 
deoxyribonucleotides, each composed: of a base, a sugar and a phosphate group. The 
sugar and phosphate groups are responsible for the physical structure of the DNA 
polymer; the base carries the genetic information. These nucleotide bases fall into two 
categories: purines and pyrimidines. The purines in DNA are adenine (A) and guanine 
(G); the pyrimidines are thymine (T) and cytosine (C). 

In 1953, James Watson and Francis Crick deduced the structure of DNA, the 
well-known double helix24. One of the most important aspects of the double helix is the 
specificity with which the nucleotide bases are paired: T only pairs with A, and G only 
pairs with C. As a result of that strict pairing, one strand of the double helix is the 


complement of the other. 
2. DNA Replication 


When a cell undergoes mitosis (see Appendix A), prior to the actual cell division, 
the parent duplicates its chromosomes by a process called DNA replication (see Figure 
21). It is important to note that replication is a semi-conservative function, meaning that 
of the two strands found in the daughter molecule, one will be directly from the parent 
and the other will be newly synthesized. Assuming no error, the complementary chains 
acts as templates for each other during the replication. A DNA strand that can replicate is 


called a replicon. 


24 There are three forms to the double helix, each of which forms A-DNA, B-DNA, and Z-DNA 
respectively. The differences among the three will not be discussed. DNA can also be circular and 
supercoiled; the significance of those forms will also not be covered. Its structure does affect its chemical 
reactivity, SO 1l1S important to mention that there are differences. 
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Replication begins at a unique site at which the DNA molecule is cleaved. The 
highly specific enzymes used in this cleavage process recognize a palindromic nucleotide 
sequence that possesses two-fold rotational symmetry.: In other words, if one strand of 
DNA has the sequence G-A-A-T-T-C, with a cleavage site between the Ts, the 
complementary strand will have the sequence C-T-T-A-A-G with the cleavage also 
between the Ts. 

Replication is not completely error-free. The affinity of A for T, and G for C 
helps to keep copy errors low. Some bacteria have spontaneous mutation rates of 10°’ per 
replication, with the more complex fruit fly having a rate of 10° [Ref 26]. Simplifying 
greatly, if an incorrect base is connected to the DNA oligomer during replication, a 
correction enzyme will usually catch it and remove it before the next base is added. 
However, if an error does occur, it is classified as one of the following three mutations: 
substitution, insertion or deletion. 

Substitution is the replacement of one base pair for another. Replacement of a 
purine by a purine or a pyrimidine by a pyrimidine is called a transition. The 
replacement of a purine by a pyrimidine (or vice versa) called a transversion. 
Substitution 1s the most common type of mutation because it is not always caught by the 
correction enzyme, especially when a tautomer of the base is actually added 2°[Ref 27]. 

Insertion and deletion are simply the process of adding or deleting one or more 
base pairs from the resulting DNA strand. Because of the error-correcting enzymes used, 
these mutations are less common during replication. After the DNA is formed, though, 
there are types of molecules capable of intercalating between adjacent base pairs. This 
can lead to an insertion or a deletion. If not done in multiples of three, the effect of these 


mutations is to alter the reading frame of the DNA. 


2) A tautomer is any molecule that has an isomer with which it is in chemical equilibrium. 
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——— Original base am New base 


Figure 21. DNA Replication 


There are a number of reasons to consider the simplest type of mutation, the 
exchange of one base for another. First, base switches reflect the basic accuracy of DNA 
replication. Second, many important mutagens act by making single-base changes. 
Finally, single-base mutations are critical to evolution, because they change genes in 
ways that are subtle enough to yield useful variants. A second important fact is that not 
all sites undergo mutation at the same rate; some are hot spots, at which mutations occur 
much more often than at most other sites. 

Single-base switches are usually reversible, and often the rate of “back” mutation 
to the normal nucleotide arrangement is similar in order of magnitude to the rate of 
change to the mutant arrangement. This fact represents an important way to distinguish 
base switches from more drastic alterations like large deletions, for which the reverse 
reaction (called reversion) is impossible. An intermediate case is that of single-base 
insertions or deletions; these may revert, but much less often than single-base switches. 
Most spontaneous single-base exchanges are simply rare failures in the replication 
process, arising when a nucleotide is added to the growing chain even though the 


nucleotide does not pair normally with the template base. 
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3. DNA Recombination and Transposons 


During meoisis, DNA is broken apart and reformed in a process called 
recombination (see Figure 22). If the chromosomes break during the process, each may 
swap a portion of its genetic material for the matching portion from its mate. This 
Swapping of genetic material 1s called a crossover. 

Recombination occurs outside of meoisis because it 1s the method by which DNA 
repairs itself. In fact, recombination is initiated by breaks or gaps in the DNA. During 
meiosis, the DNA is cut enzymatically in order to initiate the process. Regardless of why 
it occurs, it is important to note that the breakage points in the recombinants must lie 
between the same nucleotides in the two homologous chromatids, otherwise 
recombination would generate new DNA molecules differing in length from the parental 
molecules. Complementary base-pairing between DNA strands unwound from two 
different chromosomes puts the chromosomes in exact register, so that when crossover 
occurs, it occurs between identical, or nearly identical, DNA sequences. Usually these 
sequences are two equivalent regions of homologous chromosomes, but crossover can 
also occur between homologous segments in nonequivalent regions (as long as the 
recombinant survives). Such unequal crossing can cause duplications, insertions and 
deletions in chromosomes. Crossing over at nonreciprocal sites is called gene 
conversion. 

DNA segments can also be moved by site-specific recombination, which is not 
guided by homology but can cut and rejoin DNA at precise sites. In general, site-specific 
recombination serves some type of regulatory function, such as the formation of antibody 
genes. Site-specific recombination has one of two consequences, depending on how the 
interacting sites are oriented. Recombination can either remove the intervening segment, 


or invert it, which will change how the gene 1s expressed. 
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Figure 22. Meiosis 


Although the existence of site-specific recombination can invert or remove DNA 
segments, site-specific recombination usually does not affect gene order. That is because 
site-specific recombinations are limited and conservative, and in higher organisms often 
are not inherited because they occur only in somatic cells. Transposons are different 
because they move genes to new and unrelated sites. They are key factors in drug- 
resistant bacteria. 

The transposon encodes within its sequence the genes that allow it to move. 
There are two types: complex and simple. Complex transposons hold one or more genes 
in addition to the ones coding for their transposition. Simple transposons, also known as 
insertion sequences, only carry the genes necessary for their own transposition. Whether 
complex or simple, transposons interrupt or deactivate genes into which they are inserted. 
Transposons are identified by a sequence at one end of the gene that is repeated nearly 
identically, but oriented oppositely, at the other end. These areas are called inverted 


repeats. Although their numbers are very small, they exist in all organisms. 
4. Central Dogma of Molecular Biology 


Figure | in Chapter II is better known as the Central Dogma of Molecular Biology 
because it demonstrates how the sequence of a strand of DNA corresponds to the amino 
acid sequence of a protein. RNA was the original genetic material, but through evolution, 
it became an intermediary between DNA and the protein it encodes in eucaryotic 


organisms. 
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There are actually four different types of RNA that are used in gene expression: 
messenger RNA (mRNA), transfer RNA (tRNA), ribosomal RNA (rRNA), and small 
nuclear RNA (snRNA). mRNA is the template for protein synthesis. An mRNA 
molecule is produced for each gene or group of genes that is to be expressed. _It is 
created from a DNA template, with a double-stranded template being the preferred type. 
tRNA is a small RNA that has a very specific secondary and tertiary structure such that it 
can bind an amino acid at one end, and mRNA at the other end. It acts as an adaptor to 
carry the amino acid elements of a protein to the appropriate place as coded for by the 
mRNA. rRNA 1s one of the structural components of the ribosome. Parts of its sequence 
are complementary to regions of the mRNA so that the ribosome Knows where to bind to 
an MRNA from which to make a protein. snRNA is involved in the machinery that 
processes RNA’s as they travel between the nucleus and the cytoplasm. RNA does not 
use the nucleotide base thymine, but one called uracil (U). 

The enzyme that is used to start transcription has to find the beginning of a gene 
so that it knows from where to start. There is a particular DNA sequence at the beginning 
of genes, called a promoter, that is recognized by this enzyme. It is a unidirectional 
sequence on one strand of the DNA that tells the enzyme both where to start and on 
which strand to continue. Two common procaryotic promoter consensus sequences are 
TATAAT and TTGACA; common eucaryotic protoers are the CAAT and TATAAA 
boxes. Both sequences are found at specific locations on the DNA before the precise spot 
that mRNA is to begin. The DNA strand from which the mRNA is copied is the antisense 


or template strand. The other strand, to which the antisense strand is identical, is the 


| (7 
> aS Ls 
: ' 


Gene Transcription 
Begins at Base Pair 0 


sense or coding strand. 


-35 Base Pairs -10 Base Pairs 


Figure 23. Promoter Consensus Sequences 


Transcription is stopped when the mRNA reaches a termination signal. In 
general, the termination signal will be a series of G and C bases that will bind to each 
other in a hairpin loop that is followed by a string of U residues. The actual strand of 
mRNA produced is complementary to the strand from which it was synthesized and 


identical to the other. 


90 


(5’-> 3’) ATGGAATTCTCGCTC = (Coding, sense strand) 
(3’ <- 3’) TACCTTAAGAGCGAG _— (Template, antisense strand) 
(5’-> 3’) AUGGAAUUCUCGCUC (mRNA made from Template strand) 


Figure 24. Sense and Antisense Strands 


If the area to be transcribed contained introns, those areas are excised out and the 
remaining sequences are spliced together. Introns nearly always begin with GU and end 
with an AG that is preceded by a pyrimidine-rich tract. This consensus sequence is part 
of the signal for splicing. Once the introns have been removed, the mature mRNA can 
leave the nucleus, and head to the ribosome where the actual protein is synthesized. 

Once the mRNA binds to the rRNA of the ribosome, the tRNA can begin to build 
the protein. The start signal is the codon AUG found closest to the starting end, with 
GUG as a less common substitute. AUG codes for methionine, but this first amino acid 
is modified slightly. There are tRNAs for each of the amino acids, and each molecule 
contains an amino acid attachment site and a template-recognition site. The template- 
recognition site is called an anticodon, which recognizes the codon on the mRNA. The 
anticodon aligns with the codon, putting the amino acid in proximity to form peptide 
bonds with its neighbors. Translation is completed when a stop signal is reached (UAA, 
UAG, and UGA). Post-translational processing of the protein will usually remove the 


Starting methionine. 
5. Amino Acids and Proteins 


It is important to point out that the location of each specific amino acid placement 
.in the protein is exactly correlated with its location along the DNA. This property is 
called colinearity. Stated simply, colinearity is the property by which successive amino 
acids in a polypeptide chain are coded for by successive regions of a gene. 

There are twenty amino acids used in protein synthesis. These amino acids are 
divided into two groups, essential and non-essential. Essential amino acids must be 
ingested in order to maintain health, whereas non-essential amino acids are produced 


within the body (see Figure 25). 
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AMINOACID 


Histidine (HIS) Non-essential. Found in hemoglobin 


Methionine (MET) Essential. Principle supplier of sulfur to prevent disorders of the hair, 
skin, and nails; regulates the formation of ammonia and creates 
ammonia-free urine. 


| that transmits signals between nerve cells ; 
Non-essential. Synthesizes fatty acid sheath around nerve fibers 
Essential. An important part of collagen, Elastin, and enamel protein 


Tyrosine (TYR) Non-essential . Transmits nerve impulses to the brain and promotes 
functioning of the thyroid, adrenal and pituitary glands 


Valine (VAL) Essential. Found in muscle tissue 


Figure 25. The Amino Acids 
























6. RNA was the Original Molecule of Heredity 


It has been shown that RNA molecules as well as proteins can be enzymes. As a 
result of this finding, it was proposed that RNA catalyzed its own replication and 
developed other enzymatic activities. DNA was formed by the reverse transcription of 
this RNA. DNA replaced RNA as the genetic material because its double helix is a more 
stable and more reliable store of genetic information [Ref 28]. 

Recent analyses of DNA sequences from many genomes provide hints of how 
such an early coding system may have been structured. The common characteristic found 
in today’s DNA is a constancy of the relative positions of purines and pyrimidines with 
protein-coding regions. Specifically, codons of the form RNY predominate (where R is a 
purine, Y is a pyrimidine, and N can be either), suggesting that all codons may have been 


of this type originally. A primitive message composed exclusively of RNY codons could 


2 


have been translated in only one of the three posssible frames, circumventing the need for 
special start signals to fix the reading frame. Interestingly, among the eight amino acids 
specified by RNY in today’s code are amino acids that are most likely to have been 
generated by prebiotic synthesis, as well as those that often appear in meteorites [Ref 29]. 

If today’s genomes are searched for RNY periodicity, the extent of coding regions 
and their correct reading frames can usually be identified. Genes that are well expressed 
seem to have best preserved the RNY pattern over their entire length. In other genes, the 
original RNY message can still be detected but appears badly mutated or is shifted over 
parts of the coding region into other reading frames by appropriate deletion-insertion 
pairs [Ref 30]. 

Why might the RNY pattern have been favored during early evolution over other 
possible codon patterns? RNY is a self-complementary sequence. Moreover, a repeating 
RNY pattern can be perceived in a master tRNA sequence compiled from the several 
hundred tRNAs analyzed. Thus, it has been suggested that primitive tRNAs may have 
served dual roles as both adaptors and mRNAs. According to this scheme, before 
ribosomes appeared, the entire process of gene expression could have been carried out by 
a single class of RNA molecules [Ref 31]. 
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APPENDIX C. POPULATION GENETICS PRIMER 


1. Review of Mendelian Genetics 


Mendel proposed three laws that have been subsequently expanded, but still 
remain the valid rules of heredity. The law of dominance says that in a heterozygote, 
which is a diploid organism whose alleles are different at a specific locus, one allele may 
conceal the presence of another. The allele expressed in the phenotype is the dominant 
trait; the other is the recessive trait. A homozygote is an organism whose alleles are the 
same at that locus, whether they both are dominant or both recessive. It is important to 
point out that in some organisms the heterozygous phenotype is intermediate between the 
homozygous phenotypes, thereby exhibiting neither dominance nor recessiveness. The 
law of segregation says that in a heterozygote, two different alleles segregate from each 
other during gamete formation so that gametes constitute equally-proportioned single 
copies of the genotype of that individual. The law of independent assortment states that 
the expression of a gene for any single characteristic is usually not influenced by the 
expression of another characteristic. It has subsequently been shown that some genes on 


certain chromosomes exhibit non-independence, but this law 1s true for most cases. 
2. How Alleles Differ 


There is a subtle but important distinction emphasized in population genetics 
about the differences between alleles. Alleles are said to differ by origin if they come 
from the same locus on different chromosomes. Two alleles at a specified locus in a 
diploid individual are always different by origin. Alleles differ by state in one of two 
ways. If the context of the difference refers to the DNA sequence of the alleles, then the 
alleles are different by state if they have different DNA sequences. If the context is the 
product produced by the allele, the alleles differ by state if and only if they have different 
amino acids at a particular site. The final way alleles can differ is by descent; this occurs 


when they do not share a common ancestor within a certain finite number of generations. 
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3. The Hardy-Weinberg Law 


Because it is impossible to describe the genetic structure of a real-world 
population by listing all the genotypes within it, relative frequencies of alleles and 
genotypes are used. The relative frequencies of alleles and genotypes are determined by 
probabilistic reasoning. For example, if a locus has two alleles, a/ and a2, there are 
three possible genotypes: alal, ala2, and a2a2. Each genotype occurs with relative 
frequency X11, X12, and x22 respectively and x}; + x)2 + x22 = 1. The frequency of the al 
allele is xj; + 1/2 xj2 and that of a2 is 1/2x)2 + x2. 

The Hardy-Weinberg law relates allele frequency and genotype frequencies at an 
autosomal locus in a randomly mating, infinitely sized population at equilibrium. The 
equilibrium state of a single locus is reached if the population is free of other 
evolutionary forces such as mutation, migration and genetic drift. Random mating 
implies that mates are chosen with complete ignorance of their genotype, degree of 
relationship (incest or inbreeding permissible), or geographic locality2®. For the same 
genotypes above, the frequency will be De 2pq, gq respectively, where p is the frequency 
of the a/ allele and g is the frequency of the a2 allele. Using this law, it is easy to show 
that rare alleles are mostly found in heterozygotes and as a consequence their fate is tied 


to the dominance relationship. Thus, dominance is an important factor in evolution. 


4. Genetic Drift and the Neutral Theory 


In finite populations, random changes in allele frequencies result from variation in 
the number of offspring between individuals and, if the species is diploid and sexual, 
from Mendel’s law of segregation. These random changes are called genetic drift. 
Genetic drift affects evolution in two important ways—it acts as a dispersive force that 
removes genetic variation and it affects the probability of survival of new mutations. 
Mutation counters the dispersive force by putting variation back into the population. The 
neutral theory claims that much molecular variation 1s due to the interaction of drift and 
mutation. 

Genetic drift is an evolutionary force that changes both allele and genotype 


frequencies, but its effects are very weak in large populations. Roughly speaking, the 


4 . . : . . ° . 
20 A description of the genetic structure of the population must include a geographic component if the 
ultimate goal is to understand the evolutionary forces responsible for genetic variation. 
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time required for genetic drift to reduce the heterozygosity of the population by one-half 
is proportional to the population size. Its effects are much more pronounced on rare 
alleles and on small, subdivided populations with low migration rates. In the case of 
mutations, as long as the mutation is a neutral mutation (the end products are functionally 
the same), then the rate of substitution of the mutation in the population is the same as the 
mutation rate; the rate of substitution does not depend on the population size [Ref 32]. 
The neutral theory claims that most allelic variation and substitutions in proteins 
and DNA are neutral. It is called Non-Darwinian evolution because most substitutions 


are due to genetic drift, not natural selection. 
5. Natural Selection and Dominance 


Natural selection is the evolutionary force most responsible for adaptation to the 
environment. Natural selection changes allele frequencies, but only works when the 
genotypes have different fitnesses*’. The dominance relationships between alleles 
affecting fitness thus affect the outcome of the selection. 

The concept of fitness in biological organisms is typically associated with the 
probability that an individual will survive to adulthood in order to reproduce. The 
probability of survival, or the fraction of individuals that survive, will in general, depend 
on the genotype and this probability is better known as viability. Thinking of viability in 
terms of the fraction of individuals that survive to adulthood makes it easier to see that 
the frequency of the genotype after selection is proportional to the genotype’s original 
frequency multiplied by its viability (see Table 39). It is also easier to compare the 
relative frequencies of genotypes since absolute fitness is a difficult quantity to determine 
and the individual fitness may include non-genotype influences. 

Using the information in Table 39, the change in the allele frequency in a single 


generation is determined by the equation: 


= palp(w, 2 Wry )+ qm, pe: )I 


Deep i 4 2 
PW, t+ 2Ppqw,, + q Wy, 


27 The common fitnesses are individual fitness, genotype fitness, relative fitness and absolute fitness. 
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Mendel described complete dominance in which one allele clearly masked the 
existence of the other allele. However, incomplete dominance, overdominance and 
underdominance are the most interesting to evolution because most cases of complete 
dominance abounds for morphologic traits. The type of dominance is determined by a 
parameter, /, called the heterozygous effect, when comparing the relative fitness between 
genotypes. It is a measure of the fitness of the heterozygote relative to the selective 


difference between the two homozygotes (see Figure 26). 
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Figure 26. Relative Fitness of a One Locus, Two Allele Genotype 
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By using the relative frequencies, the change in the allele frequency can be 


rewritten as 
A p= pasiph+ qh) 
Ww 
and the mean fitness for the population becomes w=1-2 pqhs —q’°s. 

The three evolutionary significant types of dominance are closely associated with 
the three types of natural selection. Incomplete dominance is linked with directional 
selection, overdominance with balancing selection, and underdominance with disruptive 
selection. Directional selection is the type of selection to which Darwin unknowingly 
referred since population genetics was not yet in existence. This type of selection implies 
that the fitness of alal exceeds that of ala2, which in turn exceeds that of a2a2. This 
occurs with incomplete dominance (0 < h <1). Asa result, the frequency p is always 
increasing (see Figure 27). The rate by which p changes is strongly dependent on p itself. 
Evolution proceeds very slowly when there is little genetic variation (p 1s close to zero or 


close to one), and is most effective when p = 2. 
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Figure 27. Change in the Allele Frequency in a Single Generation 


Balanced selection occurs when there is overdominance in the allele. From 
Figure 28, when p is close to zero, the allele frequency will increase; when close to one, 
the allele frequency will decrease. There is a point in between, called the equilibrium 


value, at which the frequency no longer changes. 
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Figure 28. Balanced Selection 


In disruptive selection, the allele frequency depends on the initial frequency. If 
the initial value is less than the equilibrium value, the frequency will approach zero. The 
allele frequency will approach one if it is greater than the equilibrium value. If the two 
are equal, there will be no change at all. A small change in p, such as that caused by 
genetic drift, will cause the allele frequency to move from the equilibrium value. This 


will only happen when there is underdominance (see Figure 29). 
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Figure 29. Disruptive Selection 
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APPENDIX D. MODIFIED GPSYS-1.1 CODE AND NEWLY WRITTEN CODE 


je CHROMOSOME 


if a3 
Copyrigne Adi! Ouresh? = 1997 
This code is part of gpsys Release 1.1 
and is released for non-commercial use only. 
Questions, comments etc should be forwarded to :- 


Adil Qureshi 

University College London, 

Department of Computer Science, 

Gower St, 

Bbondone we le .oo leur. 

Pele rea (jy Ph se0° 7777 x44o6 

Bax- +44 (0) 171 367 1397 

email: A.Qureshi@cs.ucl.ac.uk 

URL : http://www.cs.ucl.ac.uk/staf£/A.Qureshi/ 


Modified by Captain Loretta Vandenberg 
Naval Postgraduate School 
Monterey, CA 93943 
All changes are marked. Some additional comments 
Added to the original code in order to clarify 
Usage. 
ay 


package gpsys; 


he 

* A Chromosome defines an evolvable gene tree. 

* 

* @see gpsys.Terminal 

* 

* @version io ehecune. O77 

* @author <a href="mailto:A.Qureshi@cs.ucl.ac.uk">Adil Qureshi</a> 
= <address>Department of Computer Science,</address> 

5 <address>University College London,</address> 

* <address>Gower St,</address> 

* <address>London WC1E 6BT,</address> 

“ <address>UK.</address> 

* 

* @author <a href="mailto:shirley@cs.nps.navy.mil">Rett Vandenberg</a> 
s <address>Department of Computer Science</address> 

es <address>Naval Postgraduate School</address> 

. <address>Monterey, CA 93943</address> 

* 

* 


~ 
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public class Chromosome implements Cloneable, java.io.Serializable { 


as 
* The Gene at the top of the tree. 


* this gene is evaulated when this ADF needs evaluation 
* 


* @see Gene 
ey, 
public Gene treeTop; 


j= 
* The GPParameters used to create this chromosome 
* 


* @see GPParameters 
=) 
pubic GPParameters gpParameters; 


Yi cigted 

* index into the adf array in the gpParameters. It is used to 
*acces the ChromosomeParameters associated with this chromosome. 
ey. 

iLjgyey retolar 

DS oB3 
* A count of the total number of nodes in this tree. 
aad 


int complexity; 


[f** 
* Added by Rett Vandenberg 
~ AP Counteor chem@reocal number Of Ine rons in this tree. 
be 

Ime introns. 


[** 
* Added by Rett Vandenberg 
* A count of the total number of dominants in this tree. 
ae 

int dominants; 


[** 
* Added by Rett Vandenberg 
* A count of the total number of recessives in this tree 
a7 

int recessives; 
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Yi 

* Constructs a new Chromosome using the specified 

* GPParamelers. 

*x 

* @param p the GPParameters to use. 

* @param adfIndex the index into the adf array 

* in GPParameters the latter 

™ defines ChromosomeParameters 

* for each adf. 

* @exception TypeException If there was a typing problem 
* during tree generation. For 

: example a Function or Terminal 
i of the required type could not 
o be found. 

<7) 


Chromosome (GPParameters p, int adfIndex) throws TypeException { 
gpParameters = p; 
adf = adfIndex; 
// always start the tree with a function 
int createMethod = p.adf[adfIndex] .createMethod; 
if (createMethod == ChromosomeParameters.CREATE_GROW) { 
treeTop = new 
GeneFunctionGrow ( 
p.adf[adfIndex] .maxDepthAtCreation - 1, 
p.adf(adfIndex] .type, p, adfIndex); 
}Jelse if (createMethod == ChromosomeParameters.CREATE FULL) { 
treeTop = new 
GeneFunctionFull ( 
p.adf (adfIndex] .maxDepthAtCreation - 1, 
p.adf(adfIndex] .type, p, adfIndex); 
Jelse { // assume (createMethod == CREATE RAMP _HALF_AND_ HALF) 
// ramp up the depth; 
//minimum = 2, maximum = maxDepthAtCreation 
double depthValue = 
p.adf{adfIndex] .maxDepthAtCreation * 
(p.creatiiemindex / p.popuationSize) ; 
int maxDepth = 
(depthValue < 2) ? 2 : (int) depthValue; 


// half the population created via grow, 
// and the other via full 
if ((p.creationindex'%.2) == 04 
' treeTop = new GeneFunctionGrow(maxDepth, 
p.adf({adfIndex] .type, p, adfIndex); 
jelse { 
treeTop = new GeneFunctionFull (maxDepth, 
p.ad£ladfIndex] .type, p, adfiIndex) ; 


} 


complexity = treeTop.complexity(); 
//added by Rett Vandenberg 
introns = treelTop.imterons |); 
dominants = treeTop.dominants(); 
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recessives = treeTop.recessives(); 
} 
Ve 
* Createsea new cola Chzromosomeewaneur as a Mutation or ihe 
* mother. 


* @param mum is the mother Chromosome. 

* Greturnmeascererence to a” chi lav@en-omosome which is a 
* Mutatiom of the mother 

* 

af 


public static Chromosome mutate(Chromosome mum) { 


// a convenience variable 
GPParameters gpParameters = mum.gpParameters; 


Veena la.aceanitaal ly a clomesonei: Ss mum 
Chromosome child = mum.deepClone(); 


// pick a branch at random from the child 
GeneBranch branch = new 
GeneBranch(gpParameters.rng, child.treeTop) ; 


// just for convenience 
int depth = 
gpParameters.adf[{child.adf] .maxDepthMutation - 1; 


// now generate the mutant branch using FULL or 
//GROW methods 
Gene newBranch; 
if ((qpPamameters rng .nextint @at 2) == 0) { 
// use FULL method 
Exy { 
newBranch = new 
GeneFunctionFull (depth, 
braneh.child.p.type, 
gpParameters, child.adf) ; 
} 
catch (TypeException e) { 


newBranch = null; 
} 
} else { 
// use GROW method 
cry 4 
newBranch = new 


GeneFunctionGrow(depth, 
branch child-p eypoe, 
gpParameters, child.adf) ; 
} 
catch (TypeException e) { 
newBranch = null; 


} 


ioaimewsaranch == snull je { 
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BO he ke Gr GP Ge ee 
* 


~ 


ean 
newBranch = new GeneTerminal (depth, 
branch. child petype, 
gpParameters, child.adf) ; 
} 
catch (TypeException e) { 
returneenald. 


// if the root of child’s tree was mutated, replace 
// the whole tree 
if (branch.parent == null) { 
child.treeTop = newBranch; 
} 
else { 
// save the branch to be mutated 
Gene tmp = 
((GeneFunction) branch.parent) .arguments [branch. index] ; 


// update the tree to include the mutated branch 
((GeneFunction) branch.parent) .arguments[{branch.index]) = 
newBranch; 


// if the “mmarated tCreemmseatioo big, child == mum 
if (child.treeTop.depth() > gpParameters.adf{child.adf] .maxDepth) 


{ 


((GeneFunction) branch.parent) .arguments([branch.index] = tmp; 
gpParameters.observer.diagnosticUpdate ( 
"Throwing away a tree after mutation"); 


} 
child.complexity = child.treeTop.complexity() ; 


//added by Rett Vandenberg 


child.introns = child: treeT@p.intwens ():; 
child.dominants = child.treeTop.dominants(); 
child.recessives = child.treeTop.recessives(); 


return child; 


Creates a new child Chromosome via crossover of the mother 
and father Chromosomes. 


@param mum 1S the mother Chromosome. The child is 
actually a copy of the mother with one branch 
exchanged with a branch from the father. 

@param dad is the father Chromosome. 

@return a reference to a new child Chromosome. 
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public static Chromosome cross(Chromosome mum, Chromosome dad) { 


// a convenience variable 
GPParameters gpParameters = mum.gpParameters; 


//this does not prevent it, it just warns you it is happening 
//I never saw it occur 
1£ (mum == dada 
gpParameters.observer.diagnosticUpdate ( 
"Danger - XOVER is incestious”); 


} 


// make the child a clone of mum 
Chromosome child = mum.deepClone() ; 


// pick a brach at random from the child to replace 
GeneBranch branchMum = 
new GeneBranch(gpParameters.rng, child.treeTop) ; 


// pick a branch from dad to replace it with 
// the new branch must return the same type as the 
// mam branch - 
GeneBranch branchDad = new 
GeneBranch(gpParameters.rng, dad.treeTop, 
branchMum.child.p.type); 


/{ if now SueGh branch fourd in deceechilda == mum 
if 1 branciDacwemadea —— mney 
gpParameters.observer.diagnosticUpdate ( 
"Couldn’t find compatible branch in dad during crossover"); 
return child; 


} 


// make a copy of dad’s branch 
Gene newBranch = branchDad.child.deepClone() ; 


// if replacing root of the child tree, tree = copy 
// of dad’s branch 


if (branchMum.parent == null) child.treeTop = newBranch; 
else { 

// save the branch being replaced 

Gene tmp = ((GeneFunction) branchMum.parent). 


arguments [branchMum. index]; 


// replace the branch with one from dad 
((GeneFunction) branchMum.parent). 
arguments [branchMum.index] = newBranch; 


//Added by Rett Vandenberg 


//dominant/recessive test 
//Based on Mendelian heredity 
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//double dominant parents make a double dominant child 
if (tmp.allele == 2 && newBranch.allele == 2) { 
echild.treevop.allele = 2; 


//One double dominant and One double recessive can only 
//produce heterozygous kids 


} else if (tmp.allele == 2 && newBranch.allele == 0) { 
child.treeTop.allele = 1; 
} else if (tmp.allele == 0 && newBranch.allele == 2) { 


child.treeTop.allele = 1; 


//two double recessives can only produce recessive child 
} else 1f (tmp.allele == 0 && newBranch.allele == 0) { 
child.treeTop.allele = 0; 


//1f you have a heterozygous parent, and a double recessive 
//parent, you have a 50% chance of a heterozygous child 
//and a 50% chance for a double recessive child 
} else if ((tmp.allele == 1 && newBranch.allele == 0) || 
(tmp.allele == 0 && newBranch.allele == 1)) { 
//this allows a 50% chance based on the 
//random seed provided 
if ((gpParameters.rng.nextiney vezye—=80) { 
child.treeTop.allele = 1; 
} else {child.treeTop.allele = 0;} 


//a double dominant parent with a heterozygous parent 


} else if ((tmp.allele == 2 && newBranch.allele == 1) || 
(tmp.allele == 1 && newBranch.allele == 2)) { 
//50% chance of double dominant or heterozygous 
if ((gpParameters.rng.nextInt() % 2) == 0) { 


child.treeTop.allele = 2; 
} else {child.treeTop.allele = 1;} 
; else 
//both parents are heterozygous 
//there’s 50% chance of a heterozygous child 
if ((gpParameters.rng.nextInt() % 2) == 0) { 
child.treeTop.allele = 1; 


//a 25% chance of a double dominant 
}) else if ((gpParametems.rng.nextint (tee = <0)er{ 
child.treeTop.allele = 2; 


//technically, a 25% chance for a double recessive, but 
//one cannot implement the ambiguity, so this may or may 
//not ever achieve a 25% 

} else {child.treeTop.allele = 0;} 


if the resulting tree is too big, child == mum 
(child.treeTop.depth() > gpParameters.adf[child.adf].maxDepth) { 
((GeneFunction) branchMum.parent) .arguments [branchMum.index] = 
tmp; 
gpParameters .observer .diagnosticUpdate ( 
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"Throwing away a tree after Xover"); 
} 


} 
child.complexity = child.treeTop.complexity() ; 


//added by Rett Vandenberg 


child.introns = ch#ild.treeTop-anmtrons()-; 
chald.dominants = child.treeTop.dominanes(); 
child.recessives = child.treeTop.recessives(); 


return child; 
}//end cross 


|e 
* Makes a deep copy of this Chromsome by making a copy of the entire 
* datargiEapl. 
* 
* @return a reference to a copy of this Chromosome. 
ie | 
public Chromosome deepClone() { 
Chromosome clone = null; 
crys 
clone = (Chromosome) this.clone(); 
clone.treeTop = this.treeTop.deepClone(); 
} é 
catch (CloneNotSupportedException e) f{ 
// will never happen as long as we implement Cloneable 
} 


return clone; 


| gai 
* Calculates the number of nodes in the Gene tree of this Chromosome. 
. 2 
* @return the number of nodes in the Gene tree. 
ss 

public int complexity() {rewurn@eonplexicy, } 


Vers 
* Added by Rett Vandenberg 
* Calculates the number of introns in the Gene tree of this 


* Chromosome. 
* 


*~@return the mumberm omer mLronuSeimatne Gene™ tree. 
ae | 
Dublic int introns() (return aerons. | 


Hees 
* Added by Rett Vandenberg 
* Calculates the number of dominants in the Gene tree of 
* this Chromosome. 


* @Greturn the number of introns in the Gene tree. 
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public int dominants() {return dominants; } 


prs 
Added by Rett Vandenberg 
Calculates the number of introns in tte Gene™tree of this 
* Chromosome. 
* 
* @return the number of introns in the Gene tree. 
-/ 
public int recessives() {return recessives; } 


* 

* Evaluates the chromosome so that it returns an Object 
* reference. 
* 
* 


@param 1 is the Individual being evaluated. 
* @return a reference to an Object returned by evaluating the 
* Gene tree. 


* @exception EvaluationException If there is an 
* evaluation failure. 
oy 


public Object evaluateObject (Individual i) throws 
EvaluationException {return treeTop.evaluateObject(i);} 


Jes 
* Evaluates the chromosome so that it returns a byte. 


@param 1 is the Individual being evaluated. 
@return a byte returned by evaluating the Gene tree. 
@exception EvaluationException If there is an 
* evaluation failure. 
aes 
public byte evaluateByte(Individual i) throws 
EvaluationException {return treeTop.evaluateByte (i) ;} 


+ 


Evaluates the chromosome so that it returns a byte. 


@param 1 is the Individual being evaluated. 

@return a byte returned by evaluating the Gene tree. 
@exception EvaluationException If there 1s an 
evaluation failure. 


+ + + + + FF HF 


SS 


public short evaluateShort (Individual i) throws 
EvaluationException {return treeTop.evaluateShort (i); } 


jx* v 
Ze ValUates tne Chronosene eso lbhak 1 returns an int. 


@param a is the Individual being evaluated. 
@return an int returned by evaluating the Gene tree. 
@exception EvaluationException If there is an 
evaluation failure. 


a 


+ + 
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public int evaluatelInt (Individual i) throws 


EvaluationException {return treeTop.evaluatelInt (i);} 


Evaluates the chromosome so that it returns a long. 


@param 1 is the Individual being evaluated. 

@return a long returned by evaluating the Gene tree. 
@exception EvaluationException If there is an 
evaluation failure. 


uae d 
public long evaluateLong (Individual i) throws 


a 


EvaluationException {return treeTop.evaluateLong(i);} 


Evaluates the chromosome so that it returns a float. 


@param 1 is the Individual being evaluated. 

@return a float returned by evaluating the Gene tree. 
@exception EvaluationException If there is an 
evaluation failure. 


public float evaluateFloat (Individual i) throws 


wes 


a 


EvaluationException {return treeTop.evaluateFloat(i);} 


Evaluates the chromosome so that it returns a double. 


@param 1 is the Individual being evaluated. 
@return a double returned by evaluating the Gene tree. 
@exception EvaluationException If there is an 


evaluation failure. 


public double evaluateDouble(Individual i) throws 


[oe 


yy 


EvaluationException {return treeTop.evaluateDouble (1) ;} 


Evaluates the chromosome so that it returns a char. 


@param i is the Individual being evaluated. 

@return a char returned by evaluating the Gene tree. 
@exception EvaluationException If there is an 
evaluation failure. 


public char evaluateChar(Individual i) throws 


7 


* 


* 


EvaluationException {return treeTop.evaluateChar(1i);} 


Evaluates the chromosome so that it returns a boolean. 


@param 1 is the Individual being evaluated. 
@return a boolean returned by evaluating the Gene tree. 
@exception EvaluationException If there is an 


evaluation failure. 
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oF 
public boolean evaluateBoolean(Individual 1) throws 
EvaluationException {return treeTop.evaluateBoolean (i) ;} 


ae 


* 


* 


* 


* 


ay 
Public(String costrane |) 


Generates a String representing a dump of the Gene tree for 
this Chromosome. 


@return 


a String representing the Gene tree for this Chromosome. 


}//end Chromosome 


ap 


/* 


* 


/ 


GENE 


{return treeTop.toString() ;} 


Copyright Adil Qureshi - 1997 
This code is part of gpsys Release 1.1 
and is released for non-commercial use only. 


Questions, 


Adil Qureshi 


comments etc should be forwarded to :- 


University College London, 
Department of Computer Science, 


Gower St, 


LOnaGcme Nelle Sal UK. 


Tel: +44 (0)171 380 
Pax: 244. (CO) 7 13387 
email: A.Qureshi@cs 
URL =: http: //www.cs 


7777 x4436 
0397 


JUcilusac.. uke 
ucl.ac.uk/sStarl/A.Qureshi/ 


Modified by Captain Loretta Vandenberg 


Department of Computer Science 
Naval Postgraduate School 


Monterey, CA 93943 


package gpsys; 


/ 


+ + F + + F F F F OF 


+ 


* 


A Gene 1S a node in a GP tree. 


terminal, 


hence this abstract 


GeneFunction and GeneTerminal 


@version 


11, Ve Crh sune 


A Gene can be either a function or a 
class has been subclassed to 
which are actually used. 


od. 


@author <a href="mailto:A.Qureshi@cs.ucl.ac.uk">Adil Qureshi</a> 
<address>Department of Computer Science, </address> 
<address>University College London,</address> 
<address>Gower St,</address> 
<address>London WC1E 6BT,</address> 


11] 


* <address>UK.</address> 


+ 


4 


@author <a href="mailto:shirley@cs.nps.navy.mil">Rett Vandenberg</a> 


2 <address>Department of Computer Science</address> 
. <address>Naval Postgraduate School</address> 

* <address>Monterey, CA 93943</address> 

ef 


public abstract class Gene implements java.io.Serializable { 


fee 

* The primitive associated with this Gene which is either a 
+ FUNCtTLOneOr a, Terminal. 

ay 3 

DUDI Te Printer ve pp; 


or 
* Added by Rett Vandenberg 
* A flag to determine whether or not this particular gene is 


* an intron or not. It will always be set to false for 
* terminals, and false for non-intron function genes 
ae 


public boolean intron; 


+ 


j 
Added by Rett Vandenberg 

A flag to determine whether or not this particular gene is 

a dominant or recessive. It will always be set to false for 
terminals intron function genes. Recessive functions will 
be set to true. Dominant functions are those provided. 


+ + + + & 


* 


=) 
public boolean dominant; 


[ges 
* Added by Rett Vandenberg 
* An integer to determine whether or not this particular gene is 
* a dominant or recessive. Two is assigned for double 
dominants, 
* 1 for heterozygous dominant, and 0 for double recessive. 
* Initial population is assigned values at random. Checked 
* checked during cross (see Chromosome) to determine the value 
* Given to. oltspring. 
=) 
public int allele; 


= 

'* Calculates the number of nodes (Genes) in the subtree 
* starting at this node (Gene). 

7 

abstract public int complexity(); 


aes 

* Calculates the maximum depth of the subtree starting at 
* this node (Gene). 

i 4 


He2 


abstract public int depth(); 


a= 


* Added by Rett Vandenberg 

* An abstract function call that will determine the number 

* of introns in a given sub-branch. It must be instantiated 
* by the GeneFunction class 

aA 


abstract publiceme antrons(); 


/ 


+ 


Added by Rett Vandenberg 

An abstract function that will determine the number of 
dominants in a given sub-branch. 

It must be instantiated 

by the GeneFunction class 


bp bp RP 2s 


* 


a 6 


abstract public int dominants(); 


+ + + + 


x 


Added by Rett Vandenberg 

An abstract function that will determine the number of 
recessives in a given sub-branch. 

It must be instantiated 

by the GeneFunction class 


a. 
abstract public int recessives(); 


[x** 


* Added by Rett Vandenberg 
teint rOn test. Mised an Fitness evaluation 


i 


public boolean 1ssintront) Bt 


La * 


af (this.i1ntren)=—  -erue) {return true, } 
else {return false; } 


* Added by Rett Vandenberg 
*~*"Peminant test. Used im Pittness Evaluation 


ce, 


public boolean isDominant() { 


px 


Die(thas deninamee—— true) f repurnm true; } 
else {return false; } 


* Added by Rett Vandenberg 
teRecessigse west mee USegmin Fitness evaluation 


a 


public boolean isRecessive() { 


Li (Chis Seominant ==—"talse) {return true: } 
else {return false; } 


i 


ws 


Evaluate this Gene as a Gene that returns an Object when 
evaluated. 


@param a the Individual being evaluated. 

@return An Object which is the evaluation result. 

@exception EvaluationException If there is an evaluation 
failure. 


public abstract Object evaluateObject (Individual i) 


oh 


throws EvaluationException; 


Evaluate this Gene as a Gene that returns a byte when evaluated. 

@param al the Individual being evaluated. 

@return A byte which is the evaluation result. 

@exception EvaluationException If there 1s an evaluation 
failure. 


public abstract byte evaluateByte(Individual i) 


| 


* 


evaluat 
* 


* 
* 
* 
* 


ae 


throws EvaluationException; 


Evaluate this Gene as a Gene that returns a short when 


ed. 

@param i the Individual being evaluated. 

@return A short which is the evaluation result. 

@exception EvaluationException If there is an evaluation 
failure. 


public abstract short evaluateShort (Individual i) 


[** 


* 


as 


throws EvaluationException; 


Evaluate this Gene as a Gene that returns a int when evaluated. 


@param 1 the Individual being evaluated. 

@return A int which is the evaluation result. 

@exception EvaluationException If there is an evaluation 
failure. 


public abstract int evaluateInt (Individual 1) 


throws EvaluationException; 


Evaluate this Gene as a Gene that returns a long when evaluated. 

@param af the Individual being evaluated. 

@return A long which is the evaluation result. 

@exception EvaluationException If there is an evaluation 
failure. 
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ad 
public abstract long evaluateLong(Individual 1) 
throws EvaluationException; 


pts 
* Evaluate this Gene as a Gene that returns a float when 


evaluated. 
* 


@param a the Individual being evaluated. 

@return A float which is the evaluation result. 
* @exception EvaluationException If there is an evaluation 
a failure. 


ay, 
public abstract float evaluateFloat (Individual i) 
throws EvaluationException; 


j** 
Fvaluate this Gene as a Gene that returns a double when 
evaluated. 


Rp Oy 


@param 1 the Individual being evaluated. 

@return A double which is the evaluation result. 

@exception EvaluationException If there is an 
evaluation failure. 


+ + + + 


ay 
public abstract double evaluateDouble(Individual i) 
throws EvaluationException; 


ied 
* Evaluate this Gene as a Gene that returns a char when 
* evaluated. 


* @param a the Individual being evaluated. 

* @return A char which is the evaluation result. 

* @exception EvaluationException If there is an 

i evaluation failure. 
ay 


public abstract char evaluateChar (Individual i) 
throws EvaluationException; 


yu 
* Evaluate this Gene as a Gene that returns a boolean when 


* evaluated. 
* 


* @param a the Individual being evaluated. 

* @return A boolean which is the evaluation result. 
@exception EvaluationException Tf there is an 

: evaluation failure. 

is 


public abstract boolean evaluateBoolean(Individual i) 
throws EvaluationException; 
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fue 
* Makes a deep clone of this Gene, i.e. the Gene and all of its 
oa veeteese 
* 
* @return A clone of the Gene and its subtrees. 
ae 
abstract public Gene deepClone(); 


pre 
* Converts the Gene and its subtrees into a String. 
* 
* @return A String representing the Gene and any subtrees. 
a 
abstract public String toString 
}//End Gene 


3. GENEFUNCTION 


ps 
Copyright Adil Qureshi - 1997 
This code is part of gpsys Release 1.1 
and is released for non-commercial use only. 
Questions, comments etc should be forwarded to :- 


Adil Qureshi 

University College London, 

Department of Computer Sclence, 

Gower St, 

Hendon, Weill. Sb apn. 

Tel: +44 (0)171 380 7777 x4436 

Pax a +44 S00) 17 337 ee oy 

email: A.Qureshi@cs.ucl.ac.uk 

URDe bt tp. Jumwues lel ac. uk/starf/A.Oureshi/ 


Modified by Captain Loretta Vandenberg 
Naval Postgraduate School 
Monterey, CA 93943 
i 


package oapsys; 


Phsoh tes 
* A GeneFunction is a Gene tree representing a function call. 
* A GeneFunction therefore has Gene arguments (branches or subtrees). 
* 
* @see gpsys .Gene 
* @see gpsys .GeneFunctionGrow 
* @see gpsys.GeneFunctionFull 
* 
* @version Ledgers VERS 7 
* @author <a href="mailto:A.Qureshi@cs.ucl.ac.uk">Adil Qureshi</a> 
* 


<address>Department of Computer Science,</address> 
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mp % BP ES bs 


~ 


<address>University College London,</address> 
<address>Gower St,</address> 

<address>London WC1E 6BT,</address> 
<address>UK.</address> 


@author <a href="mailto:shirley@cs.nps.navy.mil">Rett Vandenberg</a> 


<address>Department of Computer Science</address> 
<address>Naval Postgraduate School</address> 
<address>Monterey, CA 93943</address> 


public abstract class GeneFunction extends Gene implements Cloneable { 


jos 


* 
TS 
* 


* 


| 


The arguments for this function call. The length of this array 


equal to the number of arguments taken by the function referenced 
by this Gene. 


Gene[] arguments; 


+ + ££ ££ € FC £ FF OF 


* 


a 


Evaluate this Gene as Function returning an Object reference. 


@param 1 the individual to which this Gene belongs 

@return An Object which is the result of the evaluation. 

@exception EvaluationException If there is an evaluation 
failure. 


public Object evaluateObject (Individual i) throws 


it gl 


+ + + + + 


EvaluationException { 
return ((Function) p).evaluateObject(1i, arguments); 


Evaluate this Gene as Function returning a byte. 


@param ah the individual to which this Gene belongs 

@return A byte which is the result of the evaluation. 

@exception EvaluationException If there is an evaluation 
failure. 


public byte evaluateByte(Individual i) throws 


+ 


i, a, a, ee 7 


EvaluationException { 
return ((Function) p).evaluateByte(i, arguments) ; 


Evaluate this Gene as Function returning a short. 


@param ae tChemendivyicuael Co waren thas Gene belongs 
@return A short which is the result of the evaluation. 


Hl] 


* @exception EvaluationException If there is an evaluation 
* failure. 
* 
a 
public short evaluateShort (Individual i) throws 
EvaluationException { 
return ((Function) p).evaluateShort(i, arguments); 


oo 

* Evaluate this Gene as Function returning an int. 

* 

* @param a the individual to which this Gene belongs 

* @return An int which is the result of the evaluation. 

* @exception EvaluationException If there is an evaluation 
* failure. 

* 

* 


public int evaluateInt (Individual 1) throws 
EvaluationException { 
return ((Function) p).evaluateInt(i, arguments) ; 


ait 
* Evaluate this Gene as Function returning a long: 
* 
* @param i the individual to which this Gene belongs 
* @return A long which is the result of the evaluation. 
* @exception EvaluationException If there is an 
$ evaluation failure. 
* 
if 
public lone evaluateLong(Individual 1) throws 
EvaluationException { 
return ((Function) p).evaluateLong(i, arguments); 
} 
[** 
* Evaluate this Gene as Function returning a float. 
* 
* @param i the individual to which this Gene belongs 
* @return A float which is the result of the evaluation. 
* @exception EvaluationException If there is an 
. evaluation failure. 
* 
AN 


public float evaluateFloat (Individual 1) throws 
EvaluationException { 
return ((Function) p).evaluateFloat(1, arguments) ; 


7 xs 
* Evaluate this Gene as Function returning a double. 
* 
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* @param 1 the individual to which this Gene belongs 


* @return A double which is the result of the evaluation. 
* @exception EvaluationException If there is an 

* evaluation failure. 
* 

ae 


public double evaluateDouble (Individual 1) throws 
EvaluationException { 
return ((Function) p)-.-evaluateDouble(i, arguments); 


fe 
Evaluate this Gene as Function returning a char. 


* 
* 
* @param 1 the individual to which this Gene belongs 
* @return A char which is the result of the.evaluation. 
* @exception EvaluationException If there is an 
2 evaluation failure. 
Pd 
aes 
public char evaluateChar (Individual i) throws 
EvaluationException { 

return ((Function) p).evaluateChar(i, arguments); 


pooh 


+ 


Evaluate this Gene as Function returning a boolean. 


@param n the individual to which this Gene belongs 

@return A boolean which is the result of the evaluation. 

@exception EvaluationException If there is an 
evaluation failure. 


ee a a ee 


/ 
public boolean evaluateBoolean (Individual i) throws 
EvaluationException { 

return ((Function) p).evaluateBoolean(i; arguments); 


| ee: 
* Get the maximum depth of this Gene tree 
* 


* @return the maximum depth of the tree. 
* 


ae: 
public int depeht) { 


// the depth is initially 0 
ince d = O- 


// get the maximum depth of each branch 


for (int i = 0; 1 < arguments.length; i++) { 
int tmp = arguments[i].depth(); 
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Tks 
* 
*x 


* 


oy 


de=~(cit< tii) a. 2 celip ewe 
} 


// the maximum depth of this Gene is the 1 + the maximum depth 


// of all the branches 
return d + 1; 


Germreno mumber of Geneswineeniscecenc tree. 


Oreste MuMoen sone Cones Tn eelartse Gone sr nmce- 


pubasrc int complexity() { 


[** 
* 
* 
*x 


* 


ae 


// the total number of Genes is initially one. 
ings Sul 1; 


// add the total number of Genes in each argument 
FOr IMell = OFeae< arquments.lengen, a+) { 
sum += arguments[i].complexity(); 
} 
// the total is this sum 
return sum; 


Added by Rett Vandenberg 
Gegrptine BMUMber cHEPumtuwenseinm this) Gene tree. 


Gretvren  stne number Crelmerons In this Gene tree: 


public Wwe ginmtwens() { 


Bos 


inte Sum =) U0: 


//if the gene starting the count is an intron 
//the second condition should never have to be used 
//especially if the definition of introns is expanded 
//to other functions besides Nop 
HE U( this 7S imeem Ge 

this .tosteingq a estarteswien( (Nop). 

sum += 1; 


} 


// add the total number of introns in each argument 
for (int 2 = 0; 2 ="argquments length, 27+) { 
sum += arguments[i].introns(); 
} 
//*the total-is this gsum 
return sum; 
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* Added by Rett Vandenberg 
* Get the number of Dominants in this Gene tree. 
* 
* @return the number of dominants in this Gene tree. 
* / ' 
public int dominants() { 


int sume) 0% 


//if the gene starting the count iS a dominant 
bt (thas stsbomanant()) { sume+= 17 5 


// add the total number of dominants in each argument 
form tines = O- 2 < arguments. lengtm; 1++) { 
sum += arguments[iJ.dominants(); 
} 
// the total is this sum 
Feturn Sum; 


j** 
* Added by Rett Vandenberg 
* Get the number of Recessives in this Gene tree. 
* 
* @return the number of recessSives in this Gene tree. 
x / 
public int recessives() { 


int sum = 0; 


//i1£ the gene starting the count 1S a recessive 
if (this.isRecessive()) { sum += 1; } 


// add the total number of introns in each argument 
for (int 1 = 0; 1 < arguments.length; i++) { 
sum += arguments[i1].recessives(); 
} 
// the total 1s "this sum 
return sum; 


[** 
* Make a clone of this Gene tree, cloning all subtrees. 
* 
* @Greturn a clone of this Gene. 
ae 4 
public Gene deepClone() { 
// the clone is initially empty 
GeneFunction clone = 70. 
pe i 
// now clone the Gene, and then make arguments refer to 
// clones too 
clone’ = (GeneFumetion) this clone(); 


12] 


j** 
* Create a String representing this Gene tree. 


* 


* @return 


2a, 


} 


Cat 


} 


bet 


clone sp = Dp vinstance()- 

clone.arguments = new Gene[arguments.length]; 

for (int 2 =90; 4) <kanqumenkts. length: 2++-)4 
clone.arguments[i] = (Gene) arguments[i].deepClone(); 


} 


ch (CloneNotSupportedException e) { 
// will never happen as long as we implement Cloneable 


J /euteen, we do ..! 


urn clone; 


PublVverstrang testring (|) 


} 


Sets 
S + 
ii Oi. 


} 
Swagh 
EGE 


{ 


a String representing this tree. 


ing Ss = new String(); 


= ae + iors 


(int 2 = 0; 1. < argumentselengquh-; 1+4)4 
Ss += " " + arguments[1i}; 


= * ee 


1a ay QU 


}//End Gene Function 


4. 


7 * 


ss 


GENEFUNCTIONFULL 


Copyrigmer Adi) Qureshi, —31997 
This code is part of gpsys Release 1.1 
and is released for non-commercial use only. 


Questions, 


Adil Qureshi 


comments etc should be forwarded to :- 


University College London, 
Department of Computer Science, 


Gower St, 


Lendon WCIEV6BT, UK. 


Tel: +44 (0)171 380 
Fax: +44° (0)171 3367 
email: A.Qureshi@cs 
URL : http://www.cs 


7777 xX4436 
1357 


ele ac uk 
uel eacluky statte/ A ouresni/ 


Modified by Captain Loretta Vandenberg 
Naval Postgraduate School 


Monterey, CA 93943 
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package gpsys; 


[r* 

* A GeneFunctionFull is a GeneFunction tree created using the Full 

* method. 

* In the Full method, the tree is full depth along any path from the 
Sa GOOG 

* 


to a leaf. 


bg 


* 


@version ites OtrheJune *97 

@author <a href="mailto:A.Qureshi@cs.ucl.ac.uk">Adil Qureshi</a> 
<address>Department of Computer Science,</address> 
<address>University College London,</address> 
<address>Gower St,</address> 

“ <address>London WC1E 6BT,</address> 

<address>UK.</address> 


+ + + + + 


+ 


bg 


@author <a href="mailto:shirley@cs.nps.navy.mil">Rett Vandenberg</a> 
<address>Department of Computer Science</address> 
<address>Naval Postgraduate School</address> 
<address>Monterey, CA 93943</address> 


et te 


if 
public class GeneFunctionFull extends GeneFunction implements Cloneable 


{ 


[** 

* Create a Gene tree using the Full method. The Full method 
* tries to create trees of the maximum specified depth along 
* any branche 

* 

* @param maxDepth the maximum depth of the Gene 

* tree to be generated. 

* @param type the required return type of top 

* node in the tree. 

* @param gpParameters the GP parameters for this run. 

* @param adfindex the index of the 

ud ChromosomeParameters to used. 

* @exception TypeException f£ a Function or Termina® of a 
x required type could not be 

i: found ; 

x 

lata 


public GeneFunctionFull(int maxDepth, Type type, 
GPParameters gpParameters, int adfIndex) 
throws TypeException { 


// get an array of all functions of the required type 

Function[{] functionsOfType = gpParameters.adff{adfiIndex]. 
functionsOfTypeAtDepthFull[maxDepth]. 
Ter teype) ; 


if (functionsOfType == null) {throw new TypeException ( 


"no function of type "+type.toString() 
+" at depth "+maxDepth) ; 
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} 


// then select a function at random from this array 

//and assign to p 

int code = gpParameters.rng.nextInt() ¢functionsOfType.length; 
code = (code < 0) ? -code : code; 


//added by Rett Vandenberg to count the introns 
//will have to change if adding different types of introns, but 
//this will do for this particular experiment 


it @amncttonsOfType |code] toString). == “Nop” 9) { 
super.intron = true; 
//an intron can never be a dominant gene 
super.dominant = false; 


//All recessive functions will be implemented starting with 
Pfaner sR? 
} else if (functionsOfType[code] .toString().startsWith("R")) { 
super.dominant = false; 
//in this particular case, a recessive still adds to the 
//fitness of the individual, so it cannot be considered an 
/7Aaeron 
super.intron = false; 
} else {//they are dominants 
super.intron = false; 
super.dominant = true; 
//at creation, there is a 50% chance of a double dominant 
//and a 50% chance of a heterozygous dominant 
1£((gpParameters.rng.nextInt() % 2) == 0) { 
Super.allele = 2; //double dominant 
} else { super.allele = 1; } 


} 


Function f = (Function) functionsOfType [code] .instance(); 
p= f; 


// now generate the arguments to this function 


arguments = new Gene[f.argTypes.length] ; 
maxDepth--; 
for (int i °= 0; 1 < argquments.lengtm i++) 


1£ J(maxDepen, o70). X 
// if there is depth make another GeneFunction 


yee 
arguments[i] = new GeneFunctionFull (maxDepth, 
f.argTypes[i], 
gpParameters, 
adfIndex) ; 


} 
catch (TypeException e) { 
gpParameters.observer .diagnosticUpdate ( 
"GeneFunctionFull " + e.getMessage()); 
arguments[i] = new 
GeneTerminal(maxDepth, f.argTypes[i], 
gpParameters, adfiIndex) ; 
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} else // else make a GeneTerminal 
arguments[{i]) = new GeneTerminal (maxDepth, 
{f.argTypes[il, 
gpParameters, 
adfIndex) ; 
} 


}//End GeneFunctionFull 


es 


ne 


GENEFUNCTIONGROW 


Copyright Adil Qureshi - 1997 

This code is part of gpsys Release 1.1 

and is released for non-commercial use only. 
Questions, comments etc should be forwarded to :- 


Adil Qureshi 

University College London, 

Department of Computer Science, 

Gower St, 

hendon.Wwe bee ob 7 UK. 

Tei +44 Osea 330. /77 / K44A3i6 

Fax: +#44° (0) by 337 1397 

email: Ageuresmai ages .ucl] .acsuk 

URL : http://www.cs.ucl.ac.uk/staff/A.Qureshi/ 


Modified by Captain Loretta Vandenberg 
Naval Postgraduate School 
Monterey, CA 93943 


package gpsys; 


(as 


* 


+ 


+ + + * F  F F 


+ 


+ 


+ + + + + 


A GeneFunctionGrow 1s a GeneFunction created using the Grow method. 
In the Grow method, the Gene tree 1s such that, along any branch, 
the probability of encountering a Terminal is equal to the 
probability of encountering a Function. 


@version lel, 3Otn dace co? 

@author <a href="mailto:A.Qureshi@cs.ucl.ac.uk">Adil Qureshi</a> 
<address>Department of Computer Science,</address> 
<address>University College London, </address> 
<address>Gower St,</address> 
<address>London WC1E 6BT,</address> 
<address>UK.</address> 


@author <a href="mailto:shirley@cs.nps.navy.mil">Rett Vandenberg</a> 
<address>Department of Computer Science</address> 
<address>Naval Postgraduate School</address> 
<address>Monterey, CA 93943</address> 


5 


ay) 


public class GeneFunctionGrow extends GeneFunction implements Cloneable 


{ 


* 


* 


+ 


% 


* 


iad 


Create a Gene tree uSing the Grow method. The Grow method 
tries to creates trees in which along any branch, the 
probability of encountering a Terminal is equal to the 
probability of enountering a Function. 


@param maxDepth the maximum depth of the Gene 
tree to be generated. 
@param type the required return type of top 
node in the tree. 
@param gp Parameters the GP parameters for this run. 
@param adfIndex the index of the 
ChromosomeParameters to used. 
@exception TypeException If a Function or Terminal of 


a required type could not be found. 


public GeneFunctionGrow(int maxDepth, Type type, 


GPParameters gpParameters, int adfIndex) 
throws TypeException { 


// get an array of all functions of the required type 


Function[] functionsOfType = gpParameters.adf[adfIndex]. 
functionsOfTypeAtDepthGrow[maxDepth] . 
Gemt(type); 
if (functionsOfType == null) throw new TypeException ( 
HMO fUNCENWon Of Eype " + type.tcstring() + " at depth “+ 
maxDepth) ; 


// then select a function at random from this array and 

// assign to p 

int code = gpParameters.rng.nextInt()%functionsOfType.length; 
code = (code < 0) ? -code : code; 


//added by Rett Vandenberg to count the introns 
lf (fLumeriensOrl pel code) -tosteuineemaae Nop.) { 
Super .imeren = time - 
Ssuper.dominant = false; 
} else if (functionsOfType [code] .toString()-.startsWith("R")) { 
Super.dominant = false; 
Ssuper.intron = false; 
} else f{ 
Super.intron = false; 
super.dominant = true; 
if((gpParameters.rng.nextInt() % 2) == 0) f{ 
super.allele = 2; //double dominant 
} else { Super.allele = 1; } 
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Function £f = (Function) functionsOfType[code] .instance() ; 
Ie 


// now generate the arguments to this function 
arguments = new Gene[f.argTypes.length] ; 
maxDepth--; 

// just for convenience 

TypeToFunctionsTable typeToFunctions = 


gpParameters.adf[adfIndex]. 
functionsOfTypeAtDepthGrow[maxDepth] ; 


for (int i = 0; i < arguments.length; i++) { 
// depth allowing choose a function or a 


//terminal with a 50% chance for either 
1f ((maxDepth > 0) &&. 


((gpParameters.rng.nextInt() %@ 2) == 0) && 
(typeToFunctions.get(f.argTypes[i]) != null)) { 
Cry at 
arguments[i] = new GeneFunctionGrow(maxDepth, 
f.argTypes [i], 
gpParameters, 
adfIndex) ; 


} 
catch(TypeException e) { - 
opParameters.observer.diagnosticUpdate ( 
"GeneFunctionGrow " + e.getMessage()); 
arguments[i] = new GeneTerminal (maxDepth, 
ix,argiypes (a |, 
gpParameters, 
adfIndex) ; 
} 
} else { 
ie yer 
arguments[i] = new GeneTerminal (maxDepth, 
f .argTypes([il, 
gpParameters, 
adfIndex) ; 
} 
catch(TypeException e) { 
gpParameters.observer.diagnosticUpdate ( 
"GeneFunctionGrow " + e.getMessage()); 
arguments[i] = new GeneFunctionGrow (maxDepth, 
f.argTypes [il], 
gpParameters, 
adfIndex) ; 


} 
} 


}//End GeneFunctionGrow 
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6. XNO_TYPE 
ri * 

Captain Loretta Vandenberg 
SVE 


package gpsyS.primitives; 


1MpOveE Cpayvse.: 


| ae! 
* <pre> 
. NO_TYPE XNo_Type 


- </ pre 


* XNo_Type is a Terminal representing a variable of Type No_Type. 


* @Gsee gpsys.Type 
* @see gpsys.Primitive 
* @see gpsys.Terminal 
x 
* @version i, 2o aug 6 oo 
* @author <a href="mailto:shirley@cs-.nps.navy.mil"”>Rett 
Vandenberg</a> | 
= <address>Department of Computer Science</address> 
us <address>Naval Postgraduate School</address> 
* <address>Monterey, CA 93943</address> 
ar 
public class XNo_Type extends Terminal { 
[** 
* Holds the value of the variable. 
ay 


Type nothing; 


i 
* Construct a new variable. 
a 
public XNo_Type() { 
this.type = Type.NO_TYPE; 
} 


{2s 
* Evaluates the variable, which returns its value. 
* 


* @param 1 The variable being evaluated. 

* @return A long which is the value of the variable. 

* @exception EvaluationException If there is an 
s evaluation failure. 

ay 


public final Type evaluateNo_Type(Individual 1) { 
//throws EvaluationException { 

return Type.NO_TYPE; 
} 
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Wie 


dig 
hs 


ae A 


j** 
* Creates a copy of this Terminal. 
*x 


* @return Actully a reference to the same Terminal since there 


us is no change in instance variables required. 
ears 
public final Primitive instance() { 


return this; 


} 


Fe 


* Creates a String representing this Terminal. 
* 


* @return A String containing the name of this Terminal. 
a! 
public StaanomtoString() a 
return "No_Type"; 
} 
End No_Type 


NOP 


Loretta Vandenberg 


package gpsys.primitives; 


IMPOrt GpSyYS. 7; 


[* 

ao ee- 

: &lt;typexégt; Nop() 

7 Dre 

* 

Eliom 1S a generic FunceGromn-weet doesn t do anything, 

Pert Can Just fill a node 

* 

* @see gpsys .Type 

* @see gpsys.Primitive 

* @see gpsys.Terminal 

- ; 

* @version 1 wd 22 Aug 39 

@author <a href="mailto:shirley@cs.nps.navy.mil">Rett 

Vandenberg</a> 

- <address>Department of Computer Science</address> 

* <address>Naval Postgraduate School</address> 

* 

av 


public class Nop extends Function { 


Ls 


Vg 


* {Construct a new Nop. Function 
a 
Dublic-Nop (ivpe type) f 
this.type = type; 
this.argTypes = new Type[2]; 
argTypes[0] = type; 
argTypes [1] type; 


i 


on 
* Used to return arguments of FLOAT Types. 
* 
* @param i The individual being evaluated. 
* @param arguments The Gene tree representing the arguments. 
* @return A float representing the sum of the arguments. 
* @exception EvaluationException If there is an 
* evaluation failure. 
Ws 
public final float evaluateFloat (Individual i, Gene[] arguments) 
throws EvaluationException { 
meturn O..Of- 


Y ec 
* Used to return arguments of INT Types. 


* @param 1 . The individual being evaluated. 
@param arguments The Gene tree representing the arguments. 
* @return A float representing the sum of the arguments. 
* @exception EvaluationException If there is an 
evaluation failure. 
md 
public final int evaluateInt (Individual i, Gene[] arguments) 
throws EvaluationException { 
rebum s.0- 


//You can add more for the evaluation type needed. I only used 
//these two. 


Pelee 

* Creates an instance of this Nop object. The Type instantiation 
* is preserved so that if an Nop is being cloned, the clone will 
* also be nop instantiated. 


ee Chseiebiad A reference to this Object (yes the same object, 
since 
the type information 1s to be the same, there 


i are therefore 

x Now insteance Variables (compe modi fied. 
wit | 

public final Primitive instance() { 
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Velliv=a this. 


Nes 
* Returns a String representation of this Function. 
* 
* @return A String containing the name of the Function. 
af 
public Stwene@m@eoString() { 
return wilep 


} 


}//End Nop 
8. RMUL 
/* 


yf 


Captain Loretta Vandenberg 


package gpsys.primitives; 


tMpOse GDS) sem ; 


oes 

. 7 Pres 

2 &lt;typexX&gt; RMul(&lt;typex&gt; numl, &lt;typexX&gt; num2) 

* </pre> 

* 

* RMul is a generic function that returns the result of multiplying 
* both arguments. 

* RMul needs to be Type instantiated during construction to work with 
* particular argument Types. The supported argument Types include 
* BYTE, SHORT, INT, LONG, FLOAT and DOUBLE. 

* 

* @Gsee gpsys.Type 

* @see gopsys.Primitive 

* @see Opsvs-Function 

* 

* @version ee He oep oo 

* @author <a href="mailto:shirley@cs.nps.navy.mil">Rett 
Vandenberg</a> 

= <address>Department of Computer Science</address> 
* <address>Naval Postgraduate School</address> 

- <address>Monterey, CA 93943</address> 

* 

aa 


public class RMul extends Function { 


js 

* Constructs a RMul Function that works with the specified Type. 
* The supported Types include BYTE, SHORT, INT, LONG, FLOAT and 
* DOUBLE. 
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* 


* @param type The Type of this RMul Function. 
a | 
public RMul (Type type) { 
this.type = type; 
argTypes = new Type[2]; 
argTypes[0] = type; 
argTypes[1] = type; 


Lae 
* Used to multiply arguments of BYTE Types. 
* 
* @param 1 The individual being evaluated. 
@param arguments The Gene trees representing the 
arguments to be multiplied. 
@return A byte representing the product of the arguments. 
* @exception EvaluationException If there is an 
* evaluation failure. 
<7 
public final byte evaluateByte(Individual i, Gene[] arguments) 
throws EvaluationException { 
return (byte) (arguments[0].evaluateByte(i) * 
arguments[1].evaluateByte(i)); 


ae 


Piast 


* Used to multiply arguments of SHORT Types. 


* @param 1 The individual being evaluated. 
* @param arguments The Gene trees representing the 
: arguments to be multiplied. 
* @return A short representing the product of the arguments. 
* @exception EvaluationException If there is an evaluation 
- failure. 
ay, 
public final short evaluateShort (Individual i, Gene[] arguments) 
throws EvaluationException { 
return (short) (arguments[0].evaluateShort(i) * 
arguments[1].evaluateShort(1)); 


* Used to multiply arguments of INT Types. 


* @param i The individual being evaluated. 

* @param arguments The Gene trees representing the arguments to be 
~ added. 

* @return An int representing the product of the arguments. 


* @exception EvaluationException If there is an evaluation 
: failure. 
* / 

public final int evaluateInt (Individual i, Gene[] arguments) 


throws EvaluationException { 


return arguments[0].evaluateInt(1) * 
arguments[1].evaluateInt (1); 


pum 
* Used to multiply arguments of LONG Types. 


@param 1 The individual being evaluated. 
@param arguments The Gene trees representing the arguments to be 
added. 

@return A long representing the product of the arguments. 
@exception EvaluationException If there is an 

= evaluation failure. 

a 
public final long evaluateLong(Individual i, Gene[] arguments) 
throws EvaluationException { 

return arguments [0].evaluateLong(i)** 
arguments [1] .evaluateLong(1i); 


+ + + #2 


Used to multiply arguments of FLOAT Types. 


@param i The individual being evaluated. 

@param arguments The Gene trees representing the 
arguments to be multiplied. 

@return A float representing the product of the arguments. 

@exception EvaluationException If there 1s an 

evaluation failure. 


ee 4 ee OR or Une 


~~ 


public final float evaluateFloat (Individual i, Gene[] arguments) 
throws EvaluationException { 
Beturn arguments[0].evaluateFloat(i) * 
arguments[1].evaluateFloat(i); 


w 


Used to multiply arguments of DOUBLE Types. 


@param i The individual being evaluated. 
@param arguments The Gene trees representing the arguments to be 
added. 
@return A double representing the product of the arguments. 
@exception EvaluationException If there is an evaluation 
failure. 


+ + oF + OF 


+ OF * * OF 


/ 
public final double evaluateDouble (Individual i, Gene[] arguments) 
throws EvaluationException { . 
return arguments[0].evaluateDouble(i) * 
arguments[1] .evaluateDouble(i); 


Se) 


| ola) 
* Creates an instance of this Mul object. The Type instantiation 
is 

preserved so that if an INT Mul is being cloned, the clone will 


. 


* also be INT instantiated. 
* 
* @return A reference to this Object (yes the same object, 
since 
se the type information is to be the same, there are 
* therefore no instance variables to be modified. 
gs | 
public final Primitive instance() { 
return this; 
} 
yolk: 


* Returns a String representation of this Function. 
* 
* @return A String containing the name of the Function. 
at 
DUblte String  cOString (4 
return" RMul"; 


} 


}//fne BMul 


a RADD 


/* 
Captain Loretta Vandenberg 
= 


package gpsys.primitives; 
import gpsys.*; 


Has 

* <pre> 

x &lt;typex&gt; RAdd(&lt;typeX&égt; numl, &lt;typexX&égt; num2) 
* </pre> 


* RAdd is a generic function that returns the result of adding both 

* arguments, but does not perform as efficiently as Add. It has 

* been added to demonstrate the effect of dominant/recessive 

* genes within a GP. 

* RAdd needs to be Type instantiated during construction to work with 
* particular argument Types. The supported argument Types include 

* BYTE, SHORT, INT, LONG? FLOAT end POUBEE? 


* @see gpsys.Type 
* @see gpsys.Primitive 
* @see gpsys.Function 
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* 


* @version 1.1, 1 Sep 99 
* @author <a href="mailto:shirley@cs.nps.navy.mil"”">Rett 


Vandenberg</a> 
* <address>Department of Computer Science</address> 
= <address>Naval Postgraduate School</address> 
* <address>Monterey, CA 93943</address> 
* 
a | 


public class RAdd extends Function { 


/ 


+ + + 


+ 


* 


ey 


+ 


Constructs an RAdd Function that works with the specified Type. 
The supported Types include BYTE, SHORT, INT, LONG, FLOAT and 
DOvUS EE. 


@param type The Type of this RAdd Function. 


public RAdd(Type type) { 


f= * 


* 


* 


of 


this.type = type; 
argTypes = new Type([2]; 
argTypes [0] type; 
argTypes [1] De; 


Used to Add arguments of BYTE Types. 


@param i The individual being evaluated. 

@param arguments The Gene trees representing the arguments to be 
added. 

@return A byte representing the sum of the arguments. 

@exception EvaluationException If there 1s an evaluation 

failure. 


public final byte evaluateByte(Individual i, Gene[] arguments) 


[iB e8 


+ + + + 


+ 


* 


a 


throws EvaluationException { 
return (byte) (arguments[0]-.evaluateByte(i) + 
arguments[i1].evaluateByte(i)); 


Used to Add arguments of SHORT Types. 


@param 1 The individual being evaluated. 
@param arguments The Gene trees representing the arguments to be 
added. : 
@return A short representing the sum of the arguments. 
@exception EvaluationException If there is an evaluation 
baa lure: 


public final short evaluateShort (Individual i, Gene[] arguments) 


throws EvaluationException { 
return (short) (arguments[0].evaluateShort(i) + 
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arguments[1].evaluateShort(1)); 


* Used to Add arguments of INT Types. 


* @param i The individual being evaluated. 

* @param arguments The Gene trees representing the arguments to be 
: added. 

* @return An int representing the sum of the arguments. 


* @exception EvaluationException If there is an evaluation 
= failure. 
aa: 
public final int evaluatelInt (Individual i, Gene[] arguments) 
throws EvaluationException { 
return arguments[0].evaluateInt(i) + 


® 


arguments{1].evaluateInt (i); 


Ves 
* Used to Add arguments of LONG Types. 
* 
* @param 1 The individual being evaluated. 
* @param arguments The Gene trees representing the arguments to be 
te added. 
* @return A long representing the sum of the arguments. 
* @exception EvaluationException ff there is im evaluation 
* failure. 
i 
public final long evaluateLong(Individual 1, Gene[] arguments) 
throws EvaluationException { 
return arguments[0].evaluateLong(i) + 


arguments[1].evaluateLong(i) ; 


* Used to Add arguments of FLOAT Types. 


* @param i The individual being evaluated. 

* @param arguments The Gene trees representing the arguments to be 
- added. 

* @return A float representing the sum of the arguments. 


* @exception EvaluationException If there is an evaluation 
x failure. 
ag 


public final float evaluateFloat (Individual i, Gene[] arguments) 
throws EvaluationException { 
return arguments[0].evaluateFloat(i) + 
arguments[1].evaluateFloat (1); 


Fae 
* Used to Add arguments of DOUBLE Types. 


* 
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* @param i The individual being evaluated. 
* @param arguments The Gene trees representing the arguments to be 


- added. 

* @return A double representing the sum of the arguments. 

* @exception EvaluationException If there is an evaluation 
= failure. 

a 


public final double evaluateDouble(Individual i, Gene[] arguments) 
throws EvaluationException { 
return arguments [0].evaluateDouble(i) + 
arguments[1].evaluateDouble(1); 


* Creates an instance of this RAdd object. The Type instantiation 
* is preserved so that if an INT RAdd is being cloned, the clone 
* will also be INT instantiated. 


* @return A reference to this Object (yes the same object, 
since 
- the type information is to be the same, there are 
- therefore 
i no instance variables to be modified. 
ae 
public final Primitive instance() @@return this; @ 
age 


* Returns a String representation of this Function. 
* 


* @return A String containing the name of the Function. 
mah 
Dublic String toString () wiereturn = .cda ;} 


}//End RAdd 


10. GEOSERIES 


iiss 
Loretta Vandenberg 
eo / 


package gpsys.series; 
HmDOrt gpSVs..*; 
mMOOrt JaVa.10.*; 


Y ais 
* The user interface for the geometric series evaluation. This class 
* also contains the main() function to execute the application. 
* 
* @version 1.1, 30th June °97 


* @author <a href="mailto:shirley@cs.nps.navy.mil">Rett 
Vandenberg</a> 
‘ <address>Department of Computer Science,</address> 


i 7 


* <address>Naval Postgraduate School</address> 


a <address>Monterey, CA 93943</address> 

* 

xx f/f 

public class GeoSeries implements GPObserver { 

Vaiss 
* The filePrefix to use when saving reports and generation states 
+ Gia aac 
* e6VOluGion. 
Ay 


String et lePretix,: 


Veil 
* Construct the geometric series expansion user interface using the 
* specified file prefix. 

* 

* @param filePrefix The file prefix to be used for saving reports 
* and generation states. 

a | 
public GeoSeries(String filePrefix) {this.filePrefix = filePrefix;} 


If the filePrefix is null, just write a report of the current 
generation to the standard output. Otherwise, also append the 
report to the file "filePrefix.txt" and save the current 
gémeration to the file "fmelePrefix.pl.qzap" . 


+ + + 


ag 


+ 


* @param gpParameters The GP parameters used for this run. 

* @param how How the generation was created. Can be 
1 either 

“ CREATION, FROMSTREAM or EVOLVED. 

ty) 


public void generationUpdate(GPParameters gpParameters, int how) { 
// if a file prefix was given, write the report to file and save 
// the current generation. This need not be done if the 
// generation was just loaded from file. 
if ((filePrefix != null) && (how == GPObserver.CREATION | | 
how == GPObserver.EVOLVED)) { 
// try to save the current generation 
tmy { 
diagnosticUpdate("Saving current generation..."); 
gpParameters.save(filePrefix); 
diagnosticUpdate("Saved current generation."); 
} 
catch (IOException e) { 
System.out.println("gpParameters.save() : " + e); 
System.exit(1); 
} 


// try to write a report of Enewcmr entepepulation tomiide 
Ery 4 
gpParameters.writeReport (filePrefix,how == 
GPObserver.CREATION) ; 
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catch (IOException e) { 
System.out.println("gpParameters.writeReport() : " + e); 
System.exit(1); 


} 


// now write a report of the current population to the standard 

// Ouepue 

StringWriter sw = new StringWriter(); 

PrintWriter pw = new PrintWriter(sw); 

gpParameters.writeReport (pw, (how == GPObserver.CREATION) | | 
(how == GPObserver .FROMSTREAM) ) ; 

Systemmoutmpeamct (sw. toString) ) ; 


jag bb 


cre 


[** 
* We are not interested in this update, so we Just ignore it. 
ha 
* @param gpParameters the GP parameters used for this 
* @param 1 the Individual that has just been 
ated. 
* @param creationIndex the index of the Individual in the 
- population. 
* = 
ed 


public void individualUpdate(GPParameters gpParameters, 
Indéeyi dual “PP"ant “ereationIndex) {} 


eu). 


Can 


joes 

* We are not interested in this update, so we just ignore it. 

* 

* @param gpParameters the GP parameters used for this 

* @param individualIndex the index of the created Individual 
= in the population. 

* @param creationMethod how the Individual was created. 

* be either VIA_MUTATION or 

ia VIA_CROSSOVER. 

ef 


public void individualUpdate(GPParameters gpParameters, 
ine indy toualindex, 2k creatiloningex) { } 


pure 


* Print the diagnostic message to the standard output. 
* 


* @param s The diagnostic message. 
a 
public void diagnosticUpdate(String s) { System.out.println(s); } 


in 


* Print the exception and generate a stack trace on the standard 
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> -CWLDou b. 

* 

* @param e The exception that was genearted. 

ad 

public void exception(GPException e) { 

System.out.println(e.getMessage()); 
e.printStackTrace(); 
System.exit (0); 


* The main() method of the geometric series application. This 
* application may be invoked in any of the following ways. 


* <pre> 

* java GeoSeries &lt;filePrefixé&gt; 

* Java GeoSeries &lt;filePrefix&gt; &lt;generations> 
: java GeoSeries &lt;filePrefix&gt; &1t;RNG seed> 

is &lt;population> &lt;generations> 

. java GeoSeries &1lt;RNG seed&gt; &lt;population> 

* &lt;generations> 


* </pre> 


* The first instructs the application to restart from the last saved 
* session using the files with the prefix specified. The second is 
* the same as the first, except that the maximum number of 

* generations is modified as specified. The third istructs the 

* application to start a new run using the specifed file prefix for 
* saves, and the specified parameters for the run. The last is the 
* Similar to the previous invocation except that nothing is saved to 
* disk and is as a result very fast. 

a 

public static void main(String[{] argv) { 


GeoSeriesGPParameters gpParameters = null; 
Strang, tilePreftixc—snul, 


// now read the command line arguments 
Switch(argv.length) { 
case 1: // load entirely from saved session 
case 2: // load from saved session but change the max 
// generations 


filePrefix = argv[0]; 
erat 
System.out.println("Loading last saved generation..."); 
gpParameters = (GeoSeriesGPParameters) 
GPParameters.load(filePrefix); 
1f (gpParameters != null) 


System.out.println("Loaded last saved 
generation... "); 
} 
catch (java.1o.IOException e) { 
System.out.printlin("Loading problem : " + e); 
System.exit(1); 
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catch (ClassNotFoundException e) { 
System.out.println("Loading problem : " + e); 
System.exit(1); 

} 


iE arovetengch == 2) -{ 
int generations = Integer.parselInt(argv[1]); 
gpParameters.generations = generations; 

} 

break; 


case 3: { 
long rngSeed = Long.parseLong(argv[0]); 
int population = Integer.parseInt(argv[1]); 
int generations = Integer.parselInt(argv[2]); 
gpParameters = new GeoSeriesGPParameters (rngSeed, 
population, generations); 


} 
break; 
case "4: { 
filePrefix = argv[0]; 
long rngSeed = Long.parseLong(argv[1]); 


int population = Integer.parselInt(argv[2]); 
int generations = Integer.parseInt(argv[3]); 
gpParameters = new GeoSeriesGPParameters (rngSeed, 


population, generations); 


} 


break; 
detault: 
System.out.println("Usage : GeoSeries <file>"); 
Svstem.out. orien,” GeoSeries <file> 
<generations>"); 

SYVScteMm.OUcE.primeln ( ” GeoSeries " + 

"<RNG seed> <population> <generations>"); 
SVstemout.oraintint GeoSeries " + 

"<file> <RNG seed> <population> <generations>"); 

return; 


} 


// set the observer to be an instance of our user interface 
gpParameters.observer = new GeoSeries(filePrefix) ; 


// create a new GP system 
GPsys gpSys = new GPsys(gpParameters) ; 
// “abicesitan Grey ail ammaiy at 
gpSys .evolve(); 
} 
}//End GeoSeries 
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Il. GEOSERIESGPPPARAMETERS 


ss 
Loretta Vandenberg 
ais 


package gpsys.series; 
import gpsys.*; 


import jJava.util.Random; 


* The GPParameters class for the geometric series problem. Any 
* instance of this class has all the GP parameters correctly set. 


* @version ily ws? Aug 99 

* @author <a href="mailto:shirley@cs.nps.navy.mil">Rett 
Vandenberg</a> 

‘ <address>Department of Computer Science,</address> 

- <address>Naval Postgraduate School</address> 

m <address>Monterey, CA 93943</address> 

is! 


public class GeoSeriesGPParameters extends GPParameters { 


* Constructs a GPParameters object for the geometric series 
* expansion problem. 


* @param seed The seed to be used for the random number generator. 
* If a seed value of 0 is supplied, a unique seed is 
- generated using the current time. 
* @param population The size of the population to be used. 
* @param generations The maximum number of generations to be 
= evolved. 
ar 
GeoSeriesGPParameters(long seed, int population, int generations ) { 


this.populationSize = population; 
this.generations = generations; 


// create a seed using the current time if the seed suppied is 0 


if (seed == 0) {rngSeed = System.currentTimeMillis();} 
else {rngSeed = seed; } 
rng = new Random(rngSeed) ; 


//this has to be modified for each run if wanting to change 
pMutation = 02s 
tournamentSize = 7; 


adf = new ChromosomeParameters[1]; 
aadf[0] = new GeoSeriesChromosomeParametersADFO (rng) ; 
fitness = new GeoSeriesFitness(); 


}//End GeoSeriesGPParameters 


12. GEOSERIESFITNESS 


7 
Loretta Vandenberg 
ay. 


package gpsys.series; 


mmport GqpSys.*- 
PmoOont GpSyvsS-primitives.~*- 


Hs 

* The Fitness class for the series expansion problem. The fitness is 

* measured by the absolute value of the difference between the real 

* function and the function generated by the GP system. The fitness 
o£ . 

* one GP is is deemed better than another if either the error is 

* smaller or if the error is equal, but the complexity is smaller. 
The 


* termination criteria is met when the error is zero. This problem 
* involves finding the function :- 
* <pre> : 
i Say se ia COC 2 eee ees 1 
* </pre> 
* 
* @version ee Lone 29 
* @author <a href="mailto:shirley@cs.nps.navy.mil">Rett 
Vandenberg</a> 
- <address>Department of Computer Science,</address> 
* <address>Naval Postgraduate School</address> 
a! <address>Monterey, CA 93943</address> 
= * / 
public class GeoSeriesFitness extends Fitness { 
7 = 
* The error between the ideal function and the GP.. 
ad 


double fitness; 


ad es 

* The complexity of the GP i.e. the number of Genes it contains. 
ye: 

imt complexity: 


Us 
* The minimum value of x to be used for testing the evolved 
* finer On. 
a 

publicestavrve stloay trone- 0,0; 


j** 
* The maximum value of x to be used for testing the evolved 
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* PuUnee Len: 
a 
public Static float. to = 10-0; 


fron 
* The number of samples used to test the evolved function. 
ae 

public static int samples = 50; 


ph oaiies 
* Construct a Fitness object with default Fitness. 
ys 
public GeoSeriesFitness() { 
Eheness — 20.0; 
comp lexwty <= 0 


poe 

* Constructs a Fitness object by evaluating an Individual. 

* @param gpParameters The GP parameters for this run. 
* @param a The individual to be evaluated. 

a / 


public GeoSeriesFitness(GPParameters gpParameters, Individual 1) 


complexity = i1.complexity(); 
// calculate the raw fitness 


PLtness —1020- 
float step = (to — from) / samples; 


// xplO@atels LelLerence to, the “elost terminal used to 
// generate ADFO. 
XFloat xFloat = (XFloat) 


1.adf[0].gpParameters.adf[0j.terminals[0]; 
EOR u(r lOoge =) — 'hOnm mt —atO;, je+= Step) { 
xFloat.set(j); 


float guess = 0.0L; 


ing power == Gprarameters rng -nextint(); 
float real; 


for (int pow = power; pow >=0; pow--) { 
real += pow(j,power); 

} 

//add the 1 


real += 1.0; 


yt 
guess = 1.evaluateFloat(); 
fitness += Math.abs(real - guess); 
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} 

catch (DivideByZeroFException e) { 
// a divide by zero error is really bad 
si SOvdon’t bother wijemethe remaining tests 
fitness = Float.MAX_VALUE; 
break; 

} 

catch (GPException e) { 
gpParameters.observer.exception(e); 


Vy Subtract trom fLitmess for recessives 
J7ameenis Case, One would add 
J Tteness. += ie recessives: 


[rs 
*"Adds awkatness value to ‘chams fitness value. 
* 


* @param f the fitness value to be added. 


a 4 
public void add(Fitness f) { 
fitness += ((GeoSeriesFitness)f).fitness; 
complexity += ((GeoSeriesFitness) f) .complexity; 
recessives +=((GeoSeriesFitness)£).recessives; 
ae 
{[** 


* Added by Rett Vandenberg 

* Subtracts a fitness value to this fitness value. 

* 

* @ param f the fitness value to be subtracted. 

aay 

public void subtract (Fitness f) { 

fitness -= ((GeoSeriesFitness)f).fitness; 
complexity += ((GeoSeriesFitness)f).complexity; 
recessives += ((GeoSeriesFitness)f).recessives; 


[oil 
* Divide the fitness by the specified integer. This is used by 
* the GP system to calculate the average fitness of the 
* population. 
* 
* @param divisor the integer to divide the fitness by. 
aos 
public void divide(int divisor) { 
fitness /—=.d1V71sem- 
complexity == divisor; 


j** 


* Tests if this fitness value is greater than another fitness 


* value. 
* 
* @Qparam £ the fitness with which to compare. 
* @return true if f has higher fitness, false otherwise. 
a 
public boolean greaterThan(Fitness f) { 
if (fitness < ((GeoSeriesFitness) f).fitness) 
reEeLUrM ELue- 
if (fitness == ((GeoSeriesFitness)f).fitness) 


return complexity < 
((GeoSeriesFitness) f).complexity; 
return false; 


} 


' gales 
* Tests if this fitness is less than another fitness. 
* 
* @param f the fitness with which to compare. 
* QreGlurn Stmewbe Se nas lessmrltness  talce Otherwise. 
lf 
public boolean lessThan(Fitness f) { 
1£ (fitness > ( (GeoSeriesFitness) f) .fitness) 
recurn. true: 
1£ (fitness == ((GeoSeriesFitness) f) .fitness) 


return complexity > 
((GeoSeriesFitness)f).complexity; 
return false; 


} 


[** 
* Tests if this fitness is equal to another fitness. 
* 
* @param ie the fitness with which to compare. 
* @return true if f has the same fitness, false otherwise. 
ees 
public boolean equals(Fitness f) { 
return 
(fitness == ((GeoSeriesFitness)f).fitness) && 
(complexity == ((GeoSeriesFitness) f).complexity) ; 
} 
fs 


* Creates a new instance of the Fitness object with a default 


* fitness. 
* 


* @return an instance of a Fitness object with default fitness. 
uh 
public Fitness instance() { 
return new GeoSeriesFitness(); 


ee 
* Creates a new instance of the Fitness object which represents 
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* the fitness of the specified individual. 


* @param gpParameters the parameters for this GP run. 
* @param st the individual to be evaluated. 
* @return The Fitness of the specified individual. 
ay, 
public Fitness instance(GPParameters gpParameters, Individual i) 


return new GeoSeriesFitness(gpParameters, 1); 


jPod os 
* Nests whether this fziness mectsmeienewceummnatlon criteria. 
* 


* @Greturn true if the termination criteria has been met, false 
* otherwise. 
ai 

public boolean terminationCondition() {return fitness == 0;} 


pes 
* Converts the fitness into a String suitable for printing. 
* 


* @return A String representing the fitness. 


| 
public Stramg toString! aa 
return eameemess + "selew// + complexity 
//when testing 
//return "“Fitness( " + fitness + "," + complexity + ")"; 


} 


}//End GeoSeriesFitness 


tS. 
/* 


7 


GEOSERIESCHROMOSOMEPARAMETERSADFO 


Loretta Vandenberg 


package gpsys.series; 


MDOT t 
mMpOr t 
import 


poe 


* The 


* 


+ + 


+ 


the 


Jon 


jJava.util.Random; 


GESvVs.7; 
OPpSysS .primitives.*; 


Chromosome parameters for ADFO (the result producing branch) of 
series expansion problem. 


ADFO has the following Function and Terminal sets. 


e> 
Funetions =—(Add, Mult, RAdd,  RMul, Nop} 
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Terminals =—{xX ae 


x </pre> 


* @version lil, 30th June 627 
* @author <a href="mailto:shirley@cs.nps.navy.mil">Rett 
Vandenberg</a> 


* 
x 


* 


a 


<address>Department of Computer Science,</address> 
<address>Naval Postgraduate School</address> 
<address>Monterey, CA 93943</address> 


public class GeoSeriesChromosomeParametersADFO extends 
ChromosomeParameters { 


fore 
* Create the ChromosomeParameters object for ADFO. 


public GeoSeriesChromosomeParametersADFO (Random rng) { 


} 


maxDepth = 9; 
maxDepthAtCreation = 7; 
maxDepthMutation = 3; 


// the return type of the Chromosome when it is evaluated 
type = Type.FLOAT; 


// the types used by this chromosome 
types = new Type[2]; 


types[0] = Type.FLOAT; 
types[1] = Type.BOOLEAN; 
//types[2] = Type.NO_TYPE; 


//define the function set 


LuNnCcELoOnse— Mew FPUNnceloOm | >); 

functions[0] = new Add(Type.FLOAT) ; 
functions[1]) = new Mul (Type. FLOAT) ; 
functions[2] = new RAdd(Type.FLOAT) ; 
functions[3] = new RMul (Type.FLOAT) ; 
functions[4] = new Nop(Type.FLOAT) ; 


// @Qefine the terminal set 
terminals = new Terminal [2]; 


terminals[0] = new XFloat(); 
terminals[{1] = new One(Type.FLOAT) ; 
//terminals[2] = new XNo_Type(); 


createMethod = CREATE_RAMP_HALF_AND_HALF; 


}//GeoSeriesChromosomeParametersADF0O 
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